The data file Data_Cortex_Nuclear.xls is assumed to be in a data folder located in the parent directory.
First let’s load the necessary libraries:
Loading data:
path_to_data <- "../data/Data_Cortex_Nuclear.xls"
mice_df <- read_excel(path_to_data)
Structure of the dataset:
str(mice_df)
## tibble [1,080 × 82] (S3: tbl_df/tbl/data.frame)
## $ MouseID : chr [1:1080] "309_1" "309_2" "309_3" "309_4" ...
## $ DYRK1A_N : num [1:1080] 0.504 0.515 0.509 0.442 0.435 ...
## $ ITSN1_N : num [1:1080] 0.747 0.689 0.73 0.617 0.617 ...
## $ BDNF_N : num [1:1080] 0.43 0.412 0.418 0.359 0.359 ...
## $ NR1_N : num [1:1080] 2.82 2.79 2.69 2.47 2.37 ...
## $ NR2A_N : num [1:1080] 5.99 5.69 5.62 4.98 4.72 ...
## $ pAKT_N : num [1:1080] 0.219 0.212 0.209 0.223 0.213 ...
## $ pBRAF_N : num [1:1080] 0.178 0.173 0.176 0.176 0.174 ...
## $ pCAMKII_N : num [1:1080] 2.37 2.29 2.28 2.15 2.13 ...
## $ pCREB_N : num [1:1080] 0.232 0.227 0.23 0.207 0.192 ...
## $ pELK_N : num [1:1080] 1.75 1.6 1.56 1.6 1.5 ...
## $ pERK_N : num [1:1080] 0.688 0.695 0.677 0.583 0.551 ...
## $ pJNK_N : num [1:1080] 0.306 0.299 0.291 0.297 0.287 ...
## $ PKCA_N : num [1:1080] 0.403 0.386 0.381 0.377 0.364 ...
## $ pMEK_N : num [1:1080] 0.297 0.281 0.282 0.314 0.278 ...
## $ pNR1_N : num [1:1080] 1.022 0.957 1.004 0.875 0.865 ...
## $ pNR2A_N : num [1:1080] 0.606 0.588 0.602 0.52 0.508 ...
## $ pNR2B_N : num [1:1080] 1.88 1.73 1.73 1.57 1.48 ...
## $ pPKCAB_N : num [1:1080] 2.31 2.04 2.02 2.13 2.01 ...
## $ pRSK_N : num [1:1080] 0.442 0.445 0.468 0.478 0.483 ...
## $ AKT_N : num [1:1080] 0.859 0.835 0.814 0.728 0.688 ...
## $ BRAF_N : num [1:1080] 0.416 0.4 0.4 0.386 0.368 ...
## $ CAMKII_N : num [1:1080] 0.37 0.356 0.368 0.363 0.355 ...
## $ CREB_N : num [1:1080] 0.179 0.174 0.174 0.179 0.175 ...
## $ ELK_N : num [1:1080] 1.87 1.76 1.77 1.29 1.32 ...
## $ ERK_N : num [1:1080] 3.69 3.49 3.57 2.97 2.9 ...
## $ GSK3B_N : num [1:1080] 1.54 1.51 1.5 1.42 1.36 ...
## $ JNK_N : num [1:1080] 0.265 0.256 0.26 0.26 0.251 ...
## $ MEK_N : num [1:1080] 0.32 0.304 0.312 0.279 0.274 ...
## $ TRKA_N : num [1:1080] 0.814 0.781 0.785 0.734 0.703 ...
## $ RSK_N : num [1:1080] 0.166 0.157 0.161 0.162 0.155 ...
## $ APP_N : num [1:1080] 0.454 0.431 0.423 0.411 0.399 ...
## $ Bcatenin_N : num [1:1080] 3.04 2.92 2.94 2.5 2.46 ...
## $ SOD1_N : num [1:1080] 0.37 0.342 0.344 0.345 0.329 ...
## $ MTOR_N : num [1:1080] 0.459 0.424 0.425 0.429 0.409 ...
## $ P38_N : num [1:1080] 0.335 0.325 0.325 0.33 0.313 ...
## $ pMTOR_N : num [1:1080] 0.825 0.762 0.757 0.747 0.692 ...
## $ DSCR1_N : num [1:1080] 0.577 0.545 0.544 0.547 0.537 ...
## $ AMPKA_N : num [1:1080] 0.448 0.421 0.405 0.387 0.361 ...
## $ NR2B_N : num [1:1080] 0.586 0.545 0.553 0.548 0.513 ...
## $ pNUMB_N : num [1:1080] 0.395 0.368 0.364 0.367 0.352 ...
## $ RAPTOR_N : num [1:1080] 0.34 0.322 0.313 0.328 0.312 ...
## $ TIAM1_N : num [1:1080] 0.483 0.455 0.447 0.443 0.419 ...
## $ pP70S6_N : num [1:1080] 0.294 0.276 0.257 0.399 0.393 ...
## $ NUMB_N : num [1:1080] 0.182 0.182 0.184 0.162 0.16 ...
## $ P70S6_N : num [1:1080] 0.843 0.848 0.856 0.76 0.768 ...
## $ pGSK3B_N : num [1:1080] 0.193 0.195 0.201 0.184 0.186 ...
## $ pPKCG_N : num [1:1080] 1.44 1.44 1.52 1.61 1.65 ...
## $ CDK5_N : num [1:1080] 0.295 0.294 0.302 0.296 0.297 ...
## $ S6_N : num [1:1080] 0.355 0.355 0.386 0.291 0.309 ...
## $ ADARB1_N : num [1:1080] 1.34 1.31 1.28 1.2 1.21 ...
## $ AcetylH3K9_N : num [1:1080] 0.17 0.171 0.185 0.16 0.165 ...
## $ RRP1_N : num [1:1080] 0.159 0.158 0.149 0.166 0.161 ...
## $ BAX_N : num [1:1080] 0.189 0.185 0.191 0.185 0.188 ...
## $ ARC_N : num [1:1080] 0.106 0.107 0.108 0.103 0.105 ...
## $ ERBB4_N : num [1:1080] 0.145 0.15 0.145 0.141 0.142 ...
## $ nNOS_N : num [1:1080] 0.177 0.178 0.176 0.164 0.168 ...
## $ Tau_N : num [1:1080] 0.125 0.134 0.133 0.123 0.137 ...
## $ GFAP_N : num [1:1080] 0.115 0.118 0.118 0.117 0.116 ...
## $ GluR3_N : num [1:1080] 0.228 0.238 0.245 0.235 0.256 ...
## $ GluR4_N : num [1:1080] 0.143 0.142 0.142 0.145 0.141 ...
## $ IL1B_N : num [1:1080] 0.431 0.457 0.51 0.431 0.481 ...
## $ P3525_N : num [1:1080] 0.248 0.258 0.255 0.251 0.252 ...
## $ pCASP9_N : num [1:1080] 1.6 1.67 1.66 1.48 1.53 ...
## $ PSD95_N : num [1:1080] 2.01 2 2.02 1.96 2.01 ...
## $ SNCA_N : num [1:1080] 0.108 0.11 0.108 0.12 0.12 ...
## $ Ubiquitin_N : num [1:1080] 1.045 1.01 0.997 0.99 0.998 ...
## $ pGSK3B_Tyr216_N: num [1:1080] 0.832 0.849 0.847 0.833 0.879 ...
## $ SHH_N : num [1:1080] 0.189 0.2 0.194 0.192 0.206 ...
## $ BAD_N : num [1:1080] 0.123 0.117 0.119 0.133 0.13 ...
## $ BCL2_N : num [1:1080] NA NA NA NA NA NA NA NA NA NA ...
## $ pS6_N : num [1:1080] 0.106 0.107 0.108 0.103 0.105 ...
## $ pCFOS_N : num [1:1080] 0.108 0.104 0.106 0.111 0.111 ...
## $ SYP_N : num [1:1080] 0.427 0.442 0.436 0.392 0.434 ...
## $ H3AcK18_N : num [1:1080] 0.115 0.112 0.112 0.13 0.118 ...
## $ EGR1_N : num [1:1080] 0.132 0.135 0.133 0.147 0.14 ...
## $ H3MeK4_N : num [1:1080] 0.128 0.131 0.127 0.147 0.148 ...
## $ CaNA_N : num [1:1080] 1.68 1.74 1.93 1.7 1.84 ...
## $ Genotype : chr [1:1080] "Control" "Control" "Control" "Control" ...
## $ Treatment : chr [1:1080] "Memantine" "Memantine" "Memantine" "Memantine" ...
## $ Behavior : chr [1:1080] "C/S" "C/S" "C/S" "C/S" ...
## $ class : chr [1:1080] "c-CS-m" "c-CS-m" "c-CS-m" "c-CS-m" ...
Shape of the dataset:
dim(mice_df)
## [1] 1080 82
The dataset has dim(mice_df)[1] rows and dim(mice_df)[2] columns.
We have to take a look at mouse IDs - mice with the same number before underscore are the same.
length(unique(sapply(strsplit(mice_df$MouseID, "_"), "[[", 1)))
## [1] 72
Another solution:
num_mice <- mice_df %>%
separate(MouseID, c("MouseID", "NumberExp")) %>%
summarize(n = length(unique(MouseID)))
num_mice[[1]]
## [1] 72
Number of classes:
length(unique(mice_df$class))
## [1] 8
Classes:
c-CS-s: control mice, stimulated to learn, injected with saline (9 mice)
c-CS-m: control mice, stimulated to learn, injected with memantine (10 mice)
c-SC-s: control mice, not stimulated to learn, injected with saline (9 mice)
c-SC-m: control mice, not stimulated to learn, injected with memantine (10 mice)
t-CS-s: trisomy mice, stimulated to learn, injected with saline (7 mice)
t-CS-m: trisomy mice, stimulated to learn, injected with memantine (9 mice)
t-SC-s: trisomy mice, not stimulated to learn, injected with saline (9 mice)
t-SC-m: trisomy mice, not stimulated to learn, injected with memantine (9 mice)
Table with the number of measurements for each class:
table(mice_df$class)
##
## c-CS-m c-CS-s c-SC-m c-SC-s t-CS-m t-CS-s t-SC-m t-SC-s
## 150 135 150 135 135 105 135 135
As we can see, the numbers range from 105 to 150. Not perfectly balanced.
Let’s see how many mice in each class:
sep_mice_df <- mice_df %>%
separate(MouseID, c("MouseID", "NumberExp")) %>%
group_by(class) %>%
summarise(n = length(unique(MouseID)))
sep_mice_df
## # A tibble: 8 × 2
## class n
## <chr> <int>
## 1 c-CS-m 10
## 2 c-CS-s 9
## 3 c-SC-m 10
## 4 c-SC-s 9
## 5 t-CS-m 9
## 6 t-CS-s 7
## 7 t-SC-m 9
## 8 t-SC-s 9
sum(complete.cases(mice_df))
## [1] 552
Dropping all columns with NA:
complete_mice_df <- mice_df %>%
separate(MouseID, c("MouseID", "NumberExp")) %>%
drop_na()
table(complete_mice_df$class)
##
## c-CS-m c-CS-s c-SC-m c-SC-s t-CS-m t-CS-s t-SC-m t-SC-s
## 45 75 60 75 90 75 60 72
Not really that much data left afterwards - and there are NAs only for some proteins. Therefore, not dropping the NA rows.
Let’s check with One-Way Anova. Null hypothesis - the means of the different groups are the same. Alternative hypothesis - at least one sample mean is not equal to others.
Let’s take a look at the data, whether it is normally distributed:
#Checking if normally distributed
bdnf_n_den_plot <- ggplot(mice_df, aes(BDNF_N, fill=class))+
geom_density(alpha = 0.5)+
scale_x_continuous(name = "BDNF_N") +
facet_grid(class ~ .) +
theme_bw()
bdnf_n_den_plot
bdnf_n_violin_plot <- ggplot(mice_df, aes(x=class, y=BDNF_N, fill=class))+
geom_violin(alpha = 0.5)+
geom_boxplot(width=0.1, color='black', fill='white') +
theme_bw()
bdnf_n_violin_plot
#Plot
classes <- unique(mice_df$class)
for (c in classes){
qqPlot(subset(mice_df, class == c)$BDNF_N, ylab = c)
} # Every class seems to be normally distributed
#Shapiro test for normal distribution
for (c in classes){
print(c)
print(shapiro.test(subset(mice_df, class == c)$BDNF_N))
} # Not normally distributed
## [1] "c-CS-m"
##
## Shapiro-Wilk normality test
##
## data: subset(mice_df, class == c)$BDNF_N
## W = 0.98526, p-value = 0.1108
##
## [1] "c-SC-m"
##
## Shapiro-Wilk normality test
##
## data: subset(mice_df, class == c)$BDNF_N
## W = 0.96486, p-value = 0.0007044
##
## [1] "c-CS-s"
##
## Shapiro-Wilk normality test
##
## data: subset(mice_df, class == c)$BDNF_N
## W = 0.99109, p-value = 0.5488
##
## [1] "c-SC-s"
##
## Shapiro-Wilk normality test
##
## data: subset(mice_df, class == c)$BDNF_N
## W = 0.98519, p-value = 0.1528
##
## [1] "t-CS-m"
##
## Shapiro-Wilk normality test
##
## data: subset(mice_df, class == c)$BDNF_N
## W = 0.9755, p-value = 0.01544
##
## [1] "t-SC-m"
##
## Shapiro-Wilk normality test
##
## data: subset(mice_df, class == c)$BDNF_N
## W = 0.97018, p-value = 0.004635
##
## [1] "t-CS-s"
##
## Shapiro-Wilk normality test
##
## data: subset(mice_df, class == c)$BDNF_N
## W = 0.92351, p-value = 1.394e-05
##
## [1] "t-SC-s"
##
## Shapiro-Wilk normality test
##
## data: subset(mice_df, class == c)$BDNF_N
## W = 0.9857, p-value = 0.184
Let’s try maybe removing outliers using the 25 % and 75 % quantiles and interquartile range.
list_quant_df <- list()
for (c in classes){
c_mice_df <- subset(mice_df, class == c)
quantiles <- quantile(c_mice_df$BDNF_N, na.rm = T, probs = c(0.25, 0.75))
iqr <- IQR(c_mice_df$BDNF_N, na.rm = T)
lower <- quantiles[1] - 1.5 * iqr
higher <- quantiles[2] + 1.5 * iqr
c_quant_mice_df <- subset(c_mice_df, BDNF_N > lower & BDNF_N < higher)
list_quant_df[[c]] <- c_quant_mice_df
qqPlot(subset(mice_df, class == c)$BDNF_N, ylab = c)
print(c)
print(shapiro.test(c_quant_mice_df$BDNF_N))
}
## [1] "c-CS-m"
##
## Shapiro-Wilk normality test
##
## data: c_quant_mice_df$BDNF_N
## W = 0.9844, p-value = 0.09497
## [1] "c-SC-m"
##
## Shapiro-Wilk normality test
##
## data: c_quant_mice_df$BDNF_N
## W = 0.98995, p-value = 0.3765
## [1] "c-CS-s"
##
## Shapiro-Wilk normality test
##
## data: c_quant_mice_df$BDNF_N
## W = 0.989, p-value = 0.3674
## [1] "c-SC-s"
##
## Shapiro-Wilk normality test
##
## data: c_quant_mice_df$BDNF_N
## W = 0.98183, p-value = 0.07701
## [1] "t-CS-m"
##
## Shapiro-Wilk normality test
##
## data: c_quant_mice_df$BDNF_N
## W = 0.97747, p-value = 0.02528
## [1] "t-SC-m"
##
## Shapiro-Wilk normality test
##
## data: c_quant_mice_df$BDNF_N
## W = 0.97018, p-value = 0.004635
## [1] "t-CS-s"
##
## Shapiro-Wilk normality test
##
## data: c_quant_mice_df$BDNF_N
## W = 0.96488, p-value = 0.008181
## [1] "t-SC-s"
##
## Shapiro-Wilk normality test
##
## data: c_quant_mice_df$BDNF_N
## W = 0.9857, p-value = 0.184
quant_mice_df <- bind_rows(list_quant_df, .id = "column_label")
ggplot(quant_mice_df, aes(BDNF_N, fill=class))+
geom_density(alpha = 0.5)+
scale_x_continuous(name = "BDNF_N") +
facet_grid(class ~ .) +
theme_bw()
bdnf_n_violin_plot <- ggplot(quant_mice_df, aes(x=class, y=BDNF_N, fill=class))+
geom_violin(alpha = 0.5)+
geom_boxplot(width=0.1, color='black', fill='white') +
theme_bw()
bdnf_n_violin_plot
Okay, after the filtering step - there are still some cases where the distribution is not normal according to the Shapiro-Wilk test, but the data seems to be okay in general.
Let’s take a look at the data structure:
# Original data
mice_df %>%
group_by(class) %>%
summarise(
count = n(),
mean = mean(BDNF_N, na.rm = TRUE),
sd = sd(BDNF_N, na.rm = TRUE)
)
## # A tibble: 8 × 4
## class count mean sd
## <chr> <int> <dbl> <dbl>
## 1 c-CS-m 150 0.339 0.0469
## 2 c-CS-s 135 0.342 0.0549
## 3 c-SC-m 150 0.291 0.0386
## 4 c-SC-s 135 0.313 0.0436
## 5 t-CS-m 135 0.313 0.0512
## 6 t-CS-s 105 0.305 0.0433
## 7 t-SC-m 135 0.321 0.0332
## 8 t-SC-s 135 0.326 0.0575
# Quantile-filtered
quant_mice_df %>%
group_by(class) %>%
summarise(
count = n(),
mean = mean(BDNF_N, na.rm = TRUE),
sd = sd(BDNF_N, na.rm = TRUE)
)
## # A tibble: 8 × 4
## class count mean sd
## <chr> <int> <dbl> <dbl>
## 1 c-CS-m 147 0.338 0.0437
## 2 c-CS-s 134 0.341 0.0534
## 3 c-SC-m 147 0.292 0.0340
## 4 c-SC-s 131 0.313 0.0407
## 5 t-CS-m 134 0.312 0.0496
## 6 t-CS-s 102 0.301 0.0360
## 7 t-SC-m 135 0.321 0.0332
## 8 t-SC-s 132 0.326 0.0575
Let’s take a look at variances:
# Original data
mice_df$class <- factor(mice_df$class)
total <- ggplot(mice_df, aes(x = class, y = BDNF_N))+
geom_linerange(aes(x=class, ymax=BDNF_N, ymin = mean(mice_df$BDNF_N, na.rm = T)), size = 1,color = "grey", position = position_jitter(width = 0.1, seed = 1L))+
geom_hline(yintercept = mean(mice_df$BDNF_N, na.rm = T))+
geom_point(position = position_jitter(width = 0.1, seed = 1L), size=0.2) +
ggtitle("Total \n variance")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
gr_mean<-mice_df %>%
group_by(class) %>%
summarise(mean = mean(BDNF_N, na.rm = T))
two_pic<-right_join(x = mice_df, y = gr_mean)
hline <- data.frame(class=levels(mice_df$class), v=gr_mean$mean)
resid<-ggplot(two_pic, aes(x = class, y = BDNF_N)) +
geom_linerange(aes(x = class, ymax = BDNF_N, ymin = mean), size = 1,color = "green", position = position_jitter(width = 0.1, seed = 1L)) +
geom_point(position = position_jitter(width = 0.1, seed = 1L), size=0.2)+
geom_point(data=hline, aes(class, v), shape=95, size=15) +
ggtitle("Variance \n within")+theme(axis.text.x = element_text(angle = 45, hjust = 1))
factor <- ggplot() +
geom_linerange(data = gr_mean, aes(x= class, ymax = mean, ymin = mean(mice_df$BDNF_N, na.rm = T)), color = "blue", size = 2)+
geom_point(data = two_pic, aes(x = class, y = BDNF_N), position = position_jitter(width = 0.2, seed = 1L), size=0.2) +
geom_point(data=hline, aes(class, v), shape=95, size=15) +
geom_hline(yintercept = mean(mice_df$BDNF_N, na.rm = T)) +
ggtitle("Variance \n between")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
grid.arrange(total, factor, resid, nrow = 1)
# Quantile-filtered
quant_mice_df$class <- factor(quant_mice_df$class)
total <- ggplot(quant_mice_df, aes(x = class, y = BDNF_N))+
geom_linerange(aes(x=class, ymax=BDNF_N, ymin = mean(quant_mice_df$BDNF_N)), size = 1,color = "grey", position = position_jitter(width = 0.1, seed = 1L))+
geom_hline(yintercept = mean(quant_mice_df$BDNF_N))+
geom_point(position = position_jitter(width = 0.1, seed = 1L), size=0.2) +
ggtitle("Total \n variance")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
gr_mean<-quant_mice_df %>% group_by(class) %>% summarise(mean = mean(BDNF_N))
two_pic<-right_join(x = quant_mice_df, y = gr_mean)
hline <- data.frame(class=levels(quant_mice_df$class), v=gr_mean$mean)
resid<-ggplot(two_pic, aes(x = class, y = BDNF_N)) +
geom_linerange(aes(x = class, ymax = BDNF_N, ymin = mean), size = 1,color = "green", position = position_jitter(width = 0.1, seed = 1L)) +
geom_point(position = position_jitter(width = 0.1, seed = 1L), size=0.2)+
geom_point(data=hline, aes(class, v), shape=95, size=15) +
ggtitle("Variance \n within")+theme(axis.text.x = element_text(angle = 45, hjust = 1))
factor <- ggplot() +
geom_linerange(data = gr_mean, aes(x= class, ymax = mean, ymin = mean(quant_mice_df$BDNF_N)), color = "blue", size = 2)+
geom_point(data = two_pic, aes(x = class, y = BDNF_N), position = position_jitter(width = 0.2, seed = 1L), size=0.2) +
geom_point(data=hline, aes(class, v), shape=95, size=15) +
geom_hline(yintercept = mean(quant_mice_df$BDNF_N)) +
ggtitle("Variance \n between")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
grid.arrange(total, factor, resid, nrow = 1)
After visualizing the variances, we can use the F-test of ANOVA:
# Original data
or_res.aov <- aov(BDNF_N ~ class, data = mice_df)
or_aov <- summary(or_res.aov)
# Quantile-fitted
quant_res.aov <- aov(BDNF_N ~ class, data = quant_mice_df)
quant_aov <- summary(quant_res.aov)
Based on this there is a significant difference between the classes in the production of BDNF_N (whether the outliers are removed or not) with p-value of or_aov[[1]][["Pr(>F)"]] for original data and quant_rov[[1]][["Pr(>F)"]] for the .
Other way to do ANOVA test. First we build a linear model and then perform the ANOVA test:
# Original data
mod_mice <- lm(BDNF_N ~ class, data = mice_df)
summary(mod_mice)
##
## Call:
## lm(formula = BDNF_N ~ class, data = mice_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.175764 -0.028777 -0.001609 0.028701 0.159388
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.339217 0.003817 88.871 < 2e-16 ***
## classc-CS-s 0.003098 0.005546 0.559 0.5766
## classc-SC-m -0.048272 0.005398 -8.942 < 2e-16 ***
## classc-SC-s -0.025825 0.005546 -4.657 3.62e-06 ***
## classt-CS-m -0.026485 0.005546 -4.776 2.04e-06 ***
## classt-CS-s -0.033757 0.005948 -5.675 1.78e-08 ***
## classt-SC-m -0.018154 0.005546 -3.273 0.0011 **
## classt-SC-s -0.013631 0.005579 -2.443 0.0147 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04675 on 1069 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.1097, Adjusted R-squared: 0.1039
## F-statistic: 18.82 on 7 and 1069 DF, p-value: < 2.2e-16
mice_anova <- Anova(mod_mice)
mice_anova
## Anova Table (Type II tests)
##
## Response: BDNF_N
## Sum Sq Df F value Pr(>F)
## class 0.28784 7 18.816 < 2.2e-16 ***
## Residuals 2.33619 1069
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Quantile-fitted
quant_mod_mice <- lm(BDNF_N ~ class, data = quant_mice_df)
summary(quant_mod_mice)
##
## Call:
## lm(formula = BDNF_N ~ class, data = quant_mice_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.136228 -0.028150 -0.001234 0.028660 0.134257
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.338311 0.003666 92.288 < 2e-16 ***
## classc-CS-s 0.002849 0.005309 0.537 0.59166
## classc-SC-m -0.046181 0.005184 -8.908 < 2e-16 ***
## classc-SC-s -0.024907 0.005340 -4.664 3.50e-06 ***
## classt-CS-m -0.026728 0.005309 -5.035 5.62e-07 ***
## classt-CS-s -0.037090 0.005728 -6.476 1.44e-10 ***
## classt-SC-m -0.017248 0.005298 -3.255 0.00117 **
## classt-SC-s -0.012725 0.005330 -2.388 0.01713 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04445 on 1054 degrees of freedom
## Multiple R-squared: 0.1172, Adjusted R-squared: 0.1114
## F-statistic: 20 on 7 and 1054 DF, p-value: < 2.2e-16
quant_mice_anova <- Anova(mod_mice)
quant_mice_anova
## Anova Table (Type II tests)
##
## Response: BDNF_N
## Sum Sq Df F value Pr(>F)
## class 0.28784 7 18.816 < 2.2e-16 ***
## Residuals 2.33619 1069
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The production of BDNF_N protein significantly depends on the class in the experiment (F =18.8160539 , p_value = 8.8566341^{-24}, df_1 = 7, df_2 = 1069).
ANOVA asssumptions:
Let’s check for outliers:
mod_diag <- fortify(mod_mice)
ggplot(mod_diag, aes(x = 1:nrow(mod_diag), y = .cooksd)) +
geom_bar(stat = "identity") +
ggtitle("Cook's distance") +
xlab("Number of the observation") +
ylab("Cook's D")
quant_mod_diag <- fortify(quant_mod_mice)
ggplot(quant_mod_diag, aes(x = 1:nrow(quant_mod_diag), y = .cooksd)) +
geom_bar(stat = "identity") +
ggtitle("Cook's distance") +
xlab("Number of the observation") +
ylab("Cook's D")
For original data there are some relatively high Cook’s D values, but they’re still less than 0.2, so not so high. For quantile data the distances are very low.
ggplot(mod_diag, aes(x = class, y = .stdresid, fill=class)) +
geom_violin(alpha=0.5) +
geom_boxplot(width=0.1, color='black', fill='white') +
ggtitle("Graph of residuals") +
theme_bw()
ggplot(quant_mod_diag, aes(x = class, y = .stdresid, fill=class)) +
geom_violin(alpha=0.5) +
geom_boxplot(width=0.1, color='black', fill='white') +
ggtitle("Graph of residuals") +
theme_bw()
Testing normal distribution of the residuals:
# QQPlot
qqPlot(mod_mice$residuals, id = FALSE, main = "Quantile plot of residuals")
# Shapiro test for normal distribution
shapiro.test(mod_mice$residuals)
##
## Shapiro-Wilk normality test
##
## data: mod_mice$residuals
## W = 0.99609, p-value = 0.007949
Based on the Shapiro=Wilk test the residuals are not normally distributed.
Checking for common variance with residuals vesus fits plot and Levene’s test for homogeneity of variance:
# Checking the homogeneity of variance of original
plot(or_res.aov, 1)
# Checking the homogeneity of variance of quantile
plot(quant_res.aov, 1)
# Original data
lev <- leveneTest(BDNF_N ~ class, mice_df)
lev$`Pr(>F)`[1]
## [1] 5.172435e-10
# Quantile-filtered
lev <- leveneTest(BDNF_N ~ class, quant_mice_df)
lev$`Pr(>F)`[1]
## [1] 6.666303e-14
Based on Levene’s test the groups (independent of whether the outliers are removed) don’t have the same variances - this actually breaks one of the conditions for the ANOVA test.
Let’s try Welch one-way test tat does not make the assumption of equal variances:
mice_welch <- oneway.test(BDNF_N ~ class, data = mice_df)
Based on this the production of BDNF_N protein still significantly depends on the class in the experiment (F =20.5225485 , p_value = 5.8657162^{-24}, df_1 = 7, df_2 = 450.101448).
Post-hoc test of Tukey HSD:
post_hoc <- glht(mod_mice, linfct = mcp(class = "Tukey"))
result<-summary(post_hoc)
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
result
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: lm(formula = BDNF_N ~ class, data = mice_df)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## c-CS-s - c-CS-m == 0 0.0030979 0.0055459 0.559 0.99930
## c-SC-m - c-CS-m == 0 -0.0482717 0.0053980 -8.942 < 0.001 ***
## c-SC-s - c-CS-m == 0 -0.0258249 0.0055459 -4.657 < 0.001 ***
## t-CS-m - c-CS-m == 0 -0.0264852 0.0055459 -4.776 < 0.001 ***
## t-CS-s - c-CS-m == 0 -0.0337570 0.0059483 -5.675 < 0.001 ***
## t-SC-m - c-CS-m == 0 -0.0181541 0.0055459 -3.273 0.02444 *
## t-SC-s - c-CS-m == 0 -0.0136310 0.0055790 -2.443 0.22129
## c-SC-m - c-CS-s == 0 -0.0513696 0.0055459 -9.263 < 0.001 ***
## c-SC-s - c-CS-s == 0 -0.0289228 0.0056900 -5.083 < 0.001 ***
## t-CS-m - c-CS-s == 0 -0.0295831 0.0056900 -5.199 < 0.001 ***
## t-CS-s - c-CS-s == 0 -0.0368549 0.0060829 -6.059 < 0.001 ***
## t-SC-m - c-CS-s == 0 -0.0212520 0.0056900 -3.735 0.00495 **
## t-SC-s - c-CS-s == 0 -0.0167289 0.0057223 -2.923 0.06896 .
## c-SC-s - c-SC-m == 0 0.0224468 0.0055459 4.047 0.00145 **
## t-CS-m - c-SC-m == 0 0.0217865 0.0055459 3.928 0.00225 **
## t-CS-s - c-SC-m == 0 0.0145147 0.0059483 2.440 0.22310
## t-SC-m - c-SC-m == 0 0.0301176 0.0055459 5.431 < 0.001 ***
## t-SC-s - c-SC-m == 0 0.0346406 0.0055790 6.209 < 0.001 ***
## t-CS-m - c-SC-s == 0 -0.0006603 0.0056900 -0.116 1.00000
## t-CS-s - c-SC-s == 0 -0.0079321 0.0060829 -1.304 0.89725
## t-SC-m - c-SC-s == 0 0.0076708 0.0056900 1.348 0.87975
## t-SC-s - c-SC-s == 0 0.0121939 0.0057223 2.131 0.39548
## t-CS-s - t-CS-m == 0 -0.0072718 0.0060829 -1.195 0.93317
## t-SC-m - t-CS-m == 0 0.0083311 0.0056900 1.464 0.82603
## t-SC-s - t-CS-m == 0 0.0128542 0.0057223 2.246 0.32384
## t-SC-m - t-CS-s == 0 0.0156029 0.0060829 2.565 0.16962
## t-SC-s - t-CS-s == 0 0.0201260 0.0061130 3.292 0.02281 *
## t-SC-s - t-SC-m == 0 0.0045231 0.0057223 0.790 0.99360
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Based on the table above, there are some significant differences between classes in production of BDNF_N (marked with asterisks).
The level of production of BDNF_N in following classes is significantly lower than the level of production in c-CS-m:
The level of production of BDNF_N in following classes is significantly lower than the level of production in c-CS-m:
The level of production of BDNF_N in following classes is significantly higher than the level of production in c-SC-m:
The level of production of BDNF_N in following classes is significantly higher than the level of production in t-CS-s:
NAs for each column and row:
# Columns
col_na_counts <- data.frame(colSums(is.na(mice_df)))
col_na_counts[order(col_na_counts), ]
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [20] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3
## [39] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [58] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 7 18 18
## [77] 75 180 210 213 270 285
# As we can see there are some columns with a lot of NA values. I will remove the ones with more than 20 NAs
# Rows
row_na_counts <- data.frame(rowSums(is.na(mice_df)))
row_na_counts[order(row_na_counts), ][900:1080]
## [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [26] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [51] 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [76] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [101] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [126] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [151] 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [176] 5 5 5 43 43 43
# There are 3 rows with 43 NAs, so I will remove those
less_na_mice_df <- mice_df[rowSums(is.na(mice_df)) <= 5 , colSums(is.na(mice_df)) <= 5]
Now we take only numeric values:
less_na_mice_df <- as.data.frame(select_if(less_na_mice_df, is.numeric))
As we can see, there are some columns that
Let’s try splitting the data into training and test:
sample <- sample.split(less_na_mice_df, SplitRatio = 0.8)
training_data <- subset(less_na_mice_df, sample == TRUE)
test_data <- subset(less_na_mice_df, sample == FALSE)
Normalizing the data:
# var_training_data <- training_data[, c(-17, -18)]
# var_test_data <- test_data[, c(-17, -18)]
var_training_data <- dplyr::select(training_data, -ERBB4_N)
var_test_data <- dplyr::select(test_data, -ERBB4_N)
normParam <- caret::preProcess(var_training_data, na.remove = TRUE)
norm_training_data <- predict(normParam, var_training_data)
norm_test_data <- predict(normParam, var_test_data)
psych::describe(norm_training_data)
## vars n mean sd median trimmed mad min max range skew
## DYRK1A_N 1 856 0 1 -0.24 -0.15 0.53 -1.08 8.35 9.43 4.27
## ITSN1_N 2 856 0 1 -0.20 -0.14 0.62 -1.40 7.82 9.22 3.60
## BDNF_N 3 856 0 1 -0.04 -0.03 0.91 -4.06 3.55 7.61 0.29
## NR1_N 4 856 0 1 0.01 0.00 1.01 -2.72 4.14 6.86 0.04
## NR2A_N 5 856 0 1 -0.11 -0.05 1.00 -2.23 4.94 7.17 0.51
## pAKT_N 6 856 0 1 -0.04 -0.03 0.94 -4.12 4.95 9.07 0.23
## pBRAF_N 7 856 0 1 0.00 -0.02 0.91 -4.32 4.82 9.14 0.10
## pCAMKII_N 8 856 0 1 -0.14 -0.07 1.14 -1.68 3.00 4.69 0.51
## pCREB_N 9 856 0 1 -0.06 -0.01 1.03 -3.03 2.87 5.91 0.16
## pELK_N 10 856 0 1 -0.15 -0.11 0.54 -2.07 9.74 11.81 4.07
## pERK_N 11 856 0 1 -0.30 -0.16 0.55 -1.15 8.70 9.85 3.73
## pJNK_N 12 856 0 1 0.17 0.05 0.91 -4.89 3.41 8.30 -0.82
## PKCA_N 13 856 0 1 -0.09 -0.03 1.00 -2.38 2.94 5.32 0.28
## pMEK_N 14 856 0 1 0.03 -0.01 0.93 -4.62 3.91 8.53 -0.04
## pNR1_N 15 856 0 1 -0.05 -0.02 0.97 -2.71 4.88 7.60 0.24
## pNR2A_N 16 856 0 1 -0.04 -0.03 1.01 -2.36 3.65 6.01 0.34
## pNR2B_N 17 856 0 1 0.01 0.00 1.01 -4.60 4.28 8.88 -0.13
## pPKCAB_N 18 856 0 1 -0.32 -0.07 0.89 -1.96 2.89 4.86 0.61
## pRSK_N 19 856 0 1 -0.02 0.01 0.83 -5.14 3.10 8.24 -0.50
## AKT_N 20 856 0 1 0.01 0.00 0.94 -4.74 3.86 8.61 -0.17
## BRAF_N 21 856 0 1 -0.24 -0.16 0.48 -1.08 8.05 9.13 4.46
## CAMKII_N 22 856 0 1 -0.06 -0.03 0.88 -2.81 4.20 7.01 0.43
## CREB_N 23 856 0 1 -0.03 -0.05 0.93 -2.44 5.24 7.69 0.65
## ERK_N 24 856 0 1 -0.10 -0.07 0.95 -2.03 4.17 6.19 0.69
## GSK3B_N 25 856 0 1 -0.04 -0.03 0.86 -2.59 5.33 7.91 1.07
## JNK_N 26 856 0 1 0.10 0.04 0.90 -5.56 4.20 9.76 -0.92
## TRKA_N 27 856 0 1 0.10 0.03 0.97 -3.92 2.54 6.46 -0.37
## RSK_N 28 856 0 1 -0.06 -0.06 0.92 -2.00 4.89 6.90 0.72
## APP_N 29 856 0 1 -0.05 -0.02 0.92 -2.59 3.67 6.27 0.28
## SOD1_N 30 856 0 1 -0.34 -0.14 0.83 -1.18 4.88 6.06 1.19
## MTOR_N 31 856 0 1 0.00 -0.02 0.93 -3.81 3.10 6.90 0.05
## P38_N 32 856 0 1 -0.09 -0.06 0.95 -2.05 5.91 7.96 0.83
## pMTOR_N 33 856 0 1 0.00 0.02 0.92 -4.74 3.01 7.76 -0.63
## DSCR1_N 34 856 0 1 -0.08 -0.04 0.79 -4.07 3.31 7.39 0.06
## AMPKA_N 35 856 0 1 -0.16 -0.08 0.82 -2.24 5.26 7.50 0.87
## NR2B_N 36 856 0 1 -0.02 -0.01 0.84 -4.09 4.61 8.70 -0.06
## pNUMB_N 37 856 0 1 -0.14 -0.07 0.90 -2.70 4.33 7.04 0.71
## RAPTOR_N 38 856 0 1 -0.20 -0.09 0.89 -2.23 3.91 6.14 0.89
## TIAM1_N 39 856 0 1 -0.17 -0.09 0.89 -2.66 4.47 7.13 0.93
## pP70S6_N 40 856 0 1 -0.10 -0.08 0.92 -1.68 4.44 6.12 0.93
## NUMB_N 41 856 0 1 -0.10 -0.07 0.97 -2.18 4.36 6.53 0.76
## P70S6_N 42 856 0 1 -0.11 -0.04 0.94 -3.45 3.85 7.30 0.39
## pGSK3B_N 43 856 0 1 -0.04 -0.04 0.86 -3.19 4.36 7.56 0.51
## pPKCG_N 44 856 0 1 -0.06 -0.03 1.03 -1.93 2.93 4.86 0.24
## CDK5_N 45 856 0 1 0.05 0.01 0.88 -3.24 3.40 6.64 -0.10
## S6_N 46 856 0 1 -0.24 -0.05 1.08 -2.18 2.86 5.04 0.44
## ADARB1_N 47 856 0 1 -0.19 -0.11 0.90 -1.82 3.64 5.46 1.02
## AcetylH3K9_N 48 856 0 1 -0.37 -0.20 0.45 -0.87 6.51 7.39 2.74
## RRP1_N 49 856 0 1 -0.15 -0.11 0.65 -1.59 14.17 15.76 4.90
## BAX_N 50 856 0 1 0.07 0.04 0.90 -5.71 2.38 8.10 -0.65
## ARC_N 51 856 0 1 0.01 0.00 1.08 -3.74 2.56 6.29 -0.08
## nNOS_N 52 856 0 1 0.05 0.02 0.95 -3.29 3.19 6.48 -0.18
## Tau_N 53 856 0 1 -0.31 -0.16 0.60 -1.63 5.53 7.15 1.89
## GFAP_N 54 856 0 1 -0.04 -0.03 0.82 -2.63 6.99 9.61 0.94
## GluR3_N 55 856 0 1 -0.13 -0.04 1.07 -2.86 3.17 6.04 0.32
## GluR4_N 56 856 0 1 -0.11 -0.05 0.83 -1.95 14.70 16.65 5.27
## IL1B_N 57 856 0 1 0.00 -0.01 0.91 -2.95 4.37 7.32 0.24
## P3525_N 58 856 0 1 -0.02 -0.01 1.01 -2.78 5.06 7.84 0.19
## pCASP9_N 59 856 0 1 -0.09 -0.03 1.05 -2.83 4.28 7.11 0.42
## PSD95_N 60 856 0 1 0.04 0.05 1.00 -4.00 2.42 6.42 -0.50
## SNCA_N 61 856 0 1 -0.10 -0.06 0.93 -2.18 4.09 6.27 0.56
## Ubiquitin_N 62 856 0 1 -0.04 -0.01 1.04 -2.78 3.77 6.56 0.20
## pGSK3B_Tyr216_N 63 856 0 1 0.01 0.02 0.99 -2.86 3.75 6.61 -0.09
## SHH_N 64 856 0 1 -0.10 -0.08 0.89 -2.45 4.49 6.95 0.96
## pS6_N 65 856 0 1 0.01 0.00 1.08 -3.74 2.56 6.29 -0.08
## SYP_N 66 856 0 1 0.03 0.01 1.04 -2.82 4.49 7.31 0.05
## CaNA_N 67 856 0 1 -0.10 -0.03 1.16 -2.38 2.54 4.92 0.21
## kurtosis se
## DYRK1A_N 26.32 0.03
## ITSN1_N 21.09 0.03
## BDNF_N 0.37 0.03
## NR1_N -0.05 0.03
## NR2A_N 0.38 0.03
## pAKT_N 1.51 0.03
## pBRAF_N 1.52 0.03
## pCAMKII_N -0.59 0.03
## pCREB_N -0.12 0.03
## pELK_N 26.07 0.03
## pERK_N 21.50 0.03
## pJNK_N 2.26 0.03
## PKCA_N -0.35 0.03
## pMEK_N 1.13 0.03
## pNR1_N 0.50 0.03
## pNR2A_N -0.03 0.03
## pNR2B_N 0.72 0.03
## pPKCAB_N -0.56 0.03
## pRSK_N 2.67 0.03
## AKT_N 1.53 0.03
## BRAF_N 27.35 0.03
## CAMKII_N 0.90 0.03
## CREB_N 1.33 0.03
## ERK_N 0.50 0.03
## GSK3B_N 4.31 0.03
## JNK_N 4.24 0.03
## TRKA_N 0.04 0.03
## RSK_N 1.29 0.03
## APP_N 0.19 0.03
## SOD1_N 1.20 0.03
## MTOR_N 0.44 0.03
## P38_N 1.82 0.03
## pMTOR_N 2.49 0.03
## DSCR1_N 2.17 0.03
## AMPKA_N 1.37 0.03
## NR2B_N 2.20 0.03
## pNUMB_N 0.65 0.03
## RAPTOR_N 0.68 0.03
## TIAM1_N 1.31 0.03
## pP70S6_N 1.78 0.03
## NUMB_N 0.86 0.03
## P70S6_N 0.33 0.03
## pGSK3B_N 1.23 0.03
## pPKCG_N -0.46 0.03
## CDK5_N 0.18 0.03
## S6_N -0.76 0.03
## ADARB1_N 0.85 0.03
## AcetylH3K9_N 9.22 0.03
## RRP1_N 52.78 0.03
## BAX_N 1.57 0.03
## ARC_N -0.37 0.03
## nNOS_N 0.19 0.03
## Tau_N 4.71 0.03
## GFAP_N 4.49 0.03
## GluR3_N -0.25 0.03
## GluR4_N 67.36 0.03
## IL1B_N 0.64 0.03
## P3525_N 0.25 0.03
## pCASP9_N 0.67 0.03
## PSD95_N 0.51 0.03
## SNCA_N 0.33 0.03
## Ubiquitin_N 0.00 0.03
## pGSK3B_Tyr216_N 0.08 0.03
## SHH_N 1.85 0.03
## pS6_N -0.37 0.03
## SYP_N 0.31 0.03
## CaNA_N -0.76 0.03
psych::describe(norm_test_data)
## vars n mean sd median trimmed mad min max range skew
## DYRK1A_N 1 221 -0.01 0.99 -0.24 -0.16 0.47 -1.12 8.13 9.25 4.61
## ITSN1_N 2 221 0.01 0.96 -0.21 -0.11 0.60 -1.46 7.73 9.19 3.62
## BDNF_N 3 221 0.03 0.92 -0.04 0.01 0.86 -2.65 2.53 5.18 0.21
## NR1_N 4 221 0.07 0.90 0.00 0.05 0.93 -2.51 2.49 5.00 0.16
## NR2A_N 5 221 0.04 0.96 -0.05 -0.03 0.95 -1.96 3.49 5.45 0.63
## pAKT_N 6 221 0.15 1.07 0.07 0.09 0.89 -2.30 7.50 9.80 1.64
## pBRAF_N 7 221 0.04 0.97 0.11 0.03 0.78 -2.72 4.97 7.69 0.56
## pCAMKII_N 8 221 -0.03 0.96 -0.25 -0.11 0.95 -1.52 2.24 3.77 0.68
## pCREB_N 9 221 0.07 0.97 0.01 0.07 0.95 -2.58 2.82 5.41 0.09
## pELK_N 10 221 0.03 0.85 -0.08 -0.06 0.49 -1.32 6.05 7.37 3.30
## pERK_N 11 221 -0.02 0.98 -0.30 -0.16 0.53 -1.11 7.25 8.37 4.19
## pJNK_N 12 221 0.13 0.88 0.21 0.19 0.84 -2.42 1.85 4.27 -0.54
## PKCA_N 13 221 0.01 0.92 -0.11 -0.03 0.86 -2.29 2.68 4.97 0.34
## pMEK_N 14 221 0.08 0.90 0.19 0.10 0.81 -2.72 2.75 5.46 -0.13
## pNR1_N 15 221 0.06 0.94 0.05 0.01 0.92 -2.25 3.02 5.27 0.40
## pNR2A_N 16 221 0.05 0.99 -0.01 0.00 1.01 -2.00 3.02 5.03 0.39
## pNR2B_N 17 221 0.09 0.96 0.09 0.09 0.95 -4.56 2.71 7.26 -0.30
## pPKCAB_N 18 221 0.02 0.95 -0.32 -0.07 0.80 -1.50 3.16 4.66 0.78
## pRSK_N 19 221 0.06 0.95 -0.05 0.02 0.80 -2.17 2.84 5.02 0.45
## AKT_N 20 221 0.06 0.90 0.00 0.05 0.91 -2.66 2.58 5.24 0.08
## BRAF_N 21 221 -0.01 0.97 -0.24 -0.16 0.44 -1.07 7.41 8.48 4.81
## CAMKII_N 22 221 0.05 0.92 0.01 0.03 0.81 -2.39 3.00 5.39 0.26
## CREB_N 23 221 0.02 0.97 -0.01 0.00 0.99 -2.51 3.52 6.04 0.37
## ERK_N 24 221 0.08 0.98 -0.08 0.02 1.04 -1.85 2.65 4.50 0.52
## GSK3B_N 25 221 0.04 1.00 -0.05 -0.01 0.95 -4.16 4.70 8.87 0.69
## JNK_N 26 221 0.14 0.84 0.20 0.17 0.84 -2.49 2.16 4.65 -0.43
## TRKA_N 27 221 0.11 0.93 0.21 0.16 0.82 -4.02 2.09 6.10 -0.71
## RSK_N 28 221 0.10 1.01 0.04 0.07 0.91 -2.16 3.44 5.60 0.33
## APP_N 29 221 0.07 0.91 0.03 0.07 0.85 -2.70 2.24 4.94 -0.02
## SOD1_N 30 221 0.04 1.13 -0.40 -0.14 0.70 -1.09 4.84 5.93 1.54
## MTOR_N 31 221 -0.03 1.10 -0.02 -0.02 0.84 -3.92 3.49 7.41 -0.13
## P38_N 32 221 -0.03 1.09 -0.13 -0.12 1.06 -2.15 3.99 6.14 0.81
## pMTOR_N 33 221 0.03 1.03 0.06 0.08 1.08 -4.86 2.28 7.14 -1.10
## DSCR1_N 34 221 0.03 1.02 -0.08 0.02 0.64 -4.28 3.10 7.38 -0.34
## AMPKA_N 35 221 0.01 0.95 -0.09 -0.04 0.89 -2.03 2.67 4.69 0.43
## NR2B_N 36 221 0.02 1.00 0.01 0.06 0.88 -4.31 2.27 6.58 -0.74
## pNUMB_N 37 221 0.02 0.98 -0.16 -0.04 0.92 -2.29 3.00 5.29 0.52
## RAPTOR_N 38 221 0.04 1.01 -0.09 -0.01 1.04 -2.00 2.50 4.51 0.41
## TIAM1_N 39 221 0.02 0.95 -0.14 -0.02 0.91 -2.34 3.47 5.81 0.51
## pP70S6_N 40 221 -0.05 1.02 -0.17 -0.11 1.10 -1.70 4.71 6.41 0.78
## NUMB_N 41 221 0.05 1.08 -0.04 -0.02 1.08 -2.16 4.71 6.86 0.67
## P70S6_N 42 221 0.06 1.00 0.06 0.02 0.98 -1.91 4.27 6.18 0.55
## pGSK3B_N 43 221 0.04 1.04 -0.03 0.00 0.93 -2.27 4.83 7.10 0.68
## pPKCG_N 44 221 0.01 1.04 -0.10 -0.03 1.11 -1.90 2.60 4.50 0.30
## CDK5_N 45 221 0.19 1.43 0.11 0.13 0.84 -2.60 15.53 18.13 5.56
## S6_N 46 221 -0.02 1.01 -0.13 -0.07 1.11 -1.95 2.75 4.70 0.43
## ADARB1_N 47 221 -0.03 0.92 -0.24 -0.11 0.81 -1.77 3.27 5.04 0.87
## AcetylH3K9_N 48 221 -0.07 0.87 -0.34 -0.25 0.52 -0.85 3.77 4.62 2.14
## RRP1_N 49 221 -0.05 1.08 -0.18 -0.13 0.68 -7.28 8.32 15.60 1.16
## BAX_N 50 221 0.03 1.04 0.13 0.07 0.95 -3.52 3.31 6.84 -0.40
## ARC_N 51 221 -0.02 0.92 0.00 -0.01 1.02 -2.18 2.23 4.41 -0.09
## nNOS_N 52 221 -0.01 1.04 0.09 0.02 0.97 -2.71 3.20 5.92 -0.18
## Tau_N 53 221 -0.08 0.86 -0.36 -0.22 0.48 -1.19 3.63 4.81 1.66
## GFAP_N 54 221 -0.04 1.00 -0.06 -0.07 0.84 -2.50 4.10 6.60 0.55
## GluR3_N 55 221 -0.07 1.09 -0.26 -0.12 1.05 -3.25 2.86 6.11 0.31
## GluR4_N 56 221 -0.07 0.82 -0.16 -0.11 1.02 -1.49 2.17 3.65 0.39
## IL1B_N 57 221 -0.05 0.96 -0.05 -0.07 0.99 -1.99 2.47 4.47 0.18
## P3525_N 58 221 -0.01 0.98 -0.03 -0.04 1.11 -2.48 2.36 4.84 0.19
## pCASP9_N 59 221 0.10 1.09 -0.06 0.05 1.00 -2.05 3.74 5.79 0.53
## PSD95_N 60 221 0.07 0.97 0.06 0.10 0.95 -2.80 2.52 5.33 -0.26
## SNCA_N 61 221 -0.06 1.07 -0.13 -0.13 1.02 -2.47 3.83 6.30 0.73
## Ubiquitin_N 62 221 0.06 0.97 0.10 0.07 0.98 -2.26 2.74 4.99 -0.07
## pGSK3B_Tyr216_N 63 221 -0.01 0.96 0.03 0.00 0.88 -2.81 2.57 5.38 -0.20
## SHH_N 64 221 -0.10 0.97 -0.19 -0.18 0.85 -1.88 4.29 6.17 0.96
## pS6_N 65 221 -0.02 0.92 0.00 -0.01 1.02 -2.18 2.23 4.41 -0.09
## SYP_N 66 221 0.10 1.04 0.13 0.08 1.11 -2.43 4.77 7.21 0.39
## CaNA_N 67 221 0.08 1.06 0.10 0.07 1.27 -2.25 2.46 4.71 0.07
## kurtosis se
## DYRK1A_N 30.34 0.07
## ITSN1_N 22.23 0.06
## BDNF_N 0.08 0.06
## NR1_N -0.19 0.06
## NR2A_N 0.20 0.06
## pAKT_N 8.77 0.07
## pBRAF_N 2.88 0.06
## pCAMKII_N -0.59 0.06
## pCREB_N 0.34 0.07
## pELK_N 17.67 0.06
## pERK_N 25.19 0.07
## pJNK_N -0.04 0.06
## PKCA_N -0.03 0.06
## pMEK_N 0.09 0.06
## pNR1_N 0.16 0.06
## pNR2A_N -0.23 0.07
## pNR2B_N 1.70 0.06
## pPKCAB_N -0.27 0.06
## pRSK_N 0.16 0.06
## AKT_N -0.05 0.06
## BRAF_N 30.64 0.06
## CAMKII_N 0.41 0.06
## CREB_N 0.50 0.07
## ERK_N -0.46 0.07
## GSK3B_N 3.95 0.07
## JNK_N 0.19 0.06
## TRKA_N 1.24 0.06
## RSK_N -0.02 0.07
## APP_N -0.03 0.06
## SOD1_N 2.46 0.08
## MTOR_N 0.62 0.07
## P38_N 0.77 0.07
## pMTOR_N 3.42 0.07
## DSCR1_N 3.26 0.07
## AMPKA_N -0.28 0.06
## NR2B_N 2.12 0.07
## pNUMB_N 0.01 0.07
## RAPTOR_N -0.65 0.07
## TIAM1_N 0.35 0.06
## pP70S6_N 1.32 0.07
## NUMB_N 0.80 0.07
## P70S6_N 0.78 0.07
## pGSK3B_N 1.72 0.07
## pPKCG_N -0.63 0.07
## CDK5_N 58.10 0.10
## S6_N -0.58 0.07
## ADARB1_N 0.67 0.06
## AcetylH3K9_N 4.88 0.06
## RRP1_N 24.13 0.07
## BAX_N 0.69 0.07
## ARC_N -0.65 0.06
## nNOS_N 0.00 0.07
## Tau_N 2.69 0.06
## GFAP_N 1.54 0.07
## GluR3_N 0.00 0.07
## GluR4_N -0.77 0.05
## IL1B_N -0.39 0.06
## P3525_N -0.40 0.07
## pCASP9_N 0.04 0.07
## PSD95_N 0.15 0.07
## SNCA_N 0.74 0.07
## Ubiquitin_N -0.43 0.07
## pGSK3B_Tyr216_N -0.16 0.06
## SHH_N 1.80 0.07
## pS6_N -0.65 0.06
## SYP_N 0.95 0.07
## CaNA_N -0.72 0.07
Taking a look at the heatmap, correlation of the variables:
# Visualization of correlation matrix
cor_matr <- cor(norm_training_data)
melted_cormat <- melt(cor_matr, )
ggplot(data = melted_cormat, aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=5),
axis.text.y = element_text(size=5))
Filtering off columns that are highly correlated (>= 0.9) with each other
diag(cor_matr) <- 0
threshold <- 0.9
# Names of columns with high correlation
colnames(cor_matr[apply(abs(cor_matr) >= threshold, 1, any), apply(abs(cor_matr) >= threshold, 1, any)])
## [1] "DYRK1A_N" "ITSN1_N" "NR1_N" "pERK_N" "pNR1_N" "pNR2B_N" "BRAF_N"
## [8] "ARC_N" "pS6_N"
# Filtering off columns with high correlation, for training and test data
filt_norm_training_data <- norm_training_data[, -which(names(norm_training_data) %in% colnames(cor_matr[apply(abs(cor_matr) >= threshold, 1, any), apply(abs(cor_matr) >= threshold, 1, any)]))]
filt_norm_test_data <- norm_test_data[, -which(names(norm_test_data) %in% colnames(cor_matr[apply(abs(cor_matr) >= threshold, 1, any), apply(abs(cor_matr) >= threshold, 1, any)]))]
# Visualization of correlation matrix
cor_matr <- cor(filt_norm_training_data)
melted_cormat <- melt(cor_matr, )
ggplot(data = melted_cormat, aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=5),
axis.text.y = element_text(size=5))
final_training_data <- cbind(filt_norm_training_data, dplyr::select(training_data, ERBB4_N))
# final_test_data <- cbind(filt_norm_test_data, dplyr::select(test_data, ERBB4_N))
simple_model <- lm(ERBB4_N ~ ., data = final_training_data)
summary(simple_model)
##
## Call:
## lm(formula = ERBB4_N ~ ., data = final_training_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.026050 -0.004137 0.000132 0.003863 0.035045
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.565e-01 2.394e-04 653.772 < 2e-16 ***
## BDNF_N -4.695e-04 8.136e-04 -0.577 0.564098
## NR2A_N 5.314e-04 1.023e-03 0.519 0.603565
## pAKT_N 2.090e-03 8.215e-04 2.544 0.011137 *
## pBRAF_N -1.613e-03 7.297e-04 -2.211 0.027347 *
## pCAMKII_N -3.830e-04 6.375e-04 -0.601 0.548181
## pCREB_N -1.329e-03 6.996e-04 -1.900 0.057854 .
## pELK_N -8.827e-04 4.464e-04 -1.978 0.048319 *
## pJNK_N -1.640e-03 9.884e-04 -1.659 0.097413 .
## PKCA_N 2.193e-03 1.030e-03 2.130 0.033502 *
## pMEK_N -1.062e-03 8.734e-04 -1.216 0.224189
## pNR2A_N 1.242e-04 8.276e-04 0.150 0.880771
## pPKCAB_N 3.470e-04 1.070e-03 0.324 0.745883
## pRSK_N 7.182e-04 7.035e-04 1.021 0.307611
## AKT_N 1.718e-03 8.590e-04 1.999 0.045899 *
## CAMKII_N 1.443e-03 7.796e-04 1.851 0.064574 .
## CREB_N -8.773e-04 6.147e-04 -1.427 0.153915
## ERK_N 2.564e-03 1.196e-03 2.144 0.032369 *
## GSK3B_N -6.041e-04 9.287e-04 -0.651 0.515520
## JNK_N -6.401e-04 8.242e-04 -0.777 0.437591
## TRKA_N -2.535e-04 9.274e-04 -0.273 0.784702
## RSK_N 1.095e-03 8.066e-04 1.358 0.174913
## APP_N 2.412e-04 5.439e-04 0.443 0.657571
## SOD1_N -1.047e-03 5.678e-04 -1.844 0.065515 .
## MTOR_N 1.895e-03 8.767e-04 2.161 0.030986 *
## P38_N 7.919e-04 8.703e-04 0.910 0.363139
## pMTOR_N -1.227e-03 7.882e-04 -1.556 0.120056
## DSCR1_N -2.348e-04 7.126e-04 -0.330 0.741829
## AMPKA_N -1.250e-03 1.086e-03 -1.151 0.249888
## NR2B_N -2.692e-04 6.758e-04 -0.398 0.690492
## pNUMB_N 6.387e-05 9.250e-04 0.069 0.944968
## RAPTOR_N 1.365e-03 1.039e-03 1.314 0.189184
## TIAM1_N -1.421e-03 8.972e-04 -1.584 0.113694
## pP70S6_N 4.263e-04 6.844e-04 0.623 0.533565
## NUMB_N -2.107e-03 7.458e-04 -2.826 0.004834 **
## P70S6_N -4.616e-04 6.790e-04 -0.680 0.496859
## pGSK3B_N 1.829e-03 5.909e-04 3.095 0.002037 **
## pPKCG_N -4.175e-03 8.257e-04 -5.057 5.29e-07 ***
## CDK5_N 2.560e-03 5.444e-04 4.703 3.02e-06 ***
## S6_N 4.976e-04 6.633e-04 0.750 0.453356
## ADARB1_N 8.465e-04 5.453e-04 1.552 0.120989
## AcetylH3K9_N 2.560e-03 8.196e-04 3.123 0.001856 **
## RRP1_N -1.047e-03 3.500e-04 -2.992 0.002855 **
## BAX_N 6.193e-04 4.929e-04 1.257 0.209282
## nNOS_N 7.826e-04 5.045e-04 1.551 0.121292
## Tau_N 2.697e-03 8.433e-04 3.198 0.001438 **
## GFAP_N -1.537e-03 4.631e-04 -3.318 0.000948 ***
## GluR3_N -1.967e-03 4.918e-04 -3.998 6.97e-05 ***
## GluR4_N -5.521e-04 3.305e-04 -1.670 0.095242 .
## IL1B_N 3.110e-03 6.907e-04 4.503 7.69e-06 ***
## P3525_N 2.579e-03 5.401e-04 4.774 2.15e-06 ***
## pCASP9_N 2.794e-03 6.150e-04 4.544 6.39e-06 ***
## PSD95_N 4.541e-03 5.731e-04 7.924 7.73e-15 ***
## SNCA_N 2.917e-04 6.193e-04 0.471 0.637733
## Ubiquitin_N 9.152e-04 6.612e-04 1.384 0.166710
## pGSK3B_Tyr216_N 1.361e-03 7.107e-04 1.916 0.055784 .
## SHH_N 2.482e-04 4.162e-04 0.596 0.551154
## SYP_N 1.304e-03 4.729e-04 2.757 0.005968 **
## CaNA_N -2.752e-03 8.812e-04 -3.123 0.001854 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.007004 on 797 degrees of freedom
## Multiple R-squared: 0.8064, Adjusted R-squared: 0.7923
## F-statistic: 57.23 on 58 and 797 DF, p-value: < 2.2e-16
y_pred <- predict(simple_model, filt_norm_test_data)
Linear model with multiple R\(^2\) of 0.806371 and adjusted R\(^2\) of 0.7922801. ## 3.1 Diagnostics of the linear model
plot(simple_model)
actuals_preds <- data.frame(cbind(actuals=test_data$ERBB4_N, predicteds=y_pred))
# Very low correlation between the predicted and the actual data
correlation_accuracy <- cor(actuals_preds)
correlation_accuracy
## actuals predicteds
## actuals 1.0000000 0.8591167
## predicteds 0.8591167 1.0000000
# MAE and MSE in comparison to data
mae <- mae(final_training_data$ERBB4_N, predict(simple_model))
mse <- mse(final_training_data$ERBB4_N, predict(simple_model))
In general, at least correlation between the predicted and the actual data relatively high - 0.8591167. MAE is equal to 0.005091 and MSE to 4.5674764^{-5}.
This model is not good - we have only a couple of significant predictors. Also heteroscedasticity and high leverage was observed.
# Taking a dataframe with numeric data
less_na_mice_df <- mice_df[rowSums(is.na(mice_df)) <= 5 , colSums(is.na(mice_df)) <= 5]
less_na_num_mice_df <- as.data.frame(select_if(less_na_mice_df, is.numeric))
mice_pca <- rda(less_na_num_mice_df, scale = TRUE)
head(summary(mice_pca))
##
## Call:
## rda(X = less_na_num_mice_df, scale = TRUE)
##
## Partitioning of correlations:
## Inertia Proportion
## Total 68 1
## Unconstrained 68 1
##
## Eigenvalues, and their contribution to the correlations
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Eigenvalue 17.5792 11.3018 7.6909 5.20439 3.41001 3.33135 2.34509
## Proportion Explained 0.2585 0.1662 0.1131 0.07654 0.05015 0.04899 0.03449
## Cumulative Proportion 0.2585 0.4247 0.5378 0.61436 0.66450 0.71350 0.74798
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Eigenvalue 2.19968 1.68820 1.17887 1.08052 0.87942 0.79428 0.68146
## Proportion Explained 0.03235 0.02483 0.01734 0.01589 0.01293 0.01168 0.01002
## Cumulative Proportion 0.78033 0.80516 0.82249 0.83838 0.85132 0.86300 0.87302
## PC15 PC16 PC17 PC18 PC19 PC20
## Eigenvalue 0.62761 0.595110 0.53652 0.481644 0.425525 0.414652
## Proportion Explained 0.00923 0.008752 0.00789 0.007083 0.006258 0.006098
## Cumulative Proportion 0.88225 0.890999 0.89889 0.905972 0.912230 0.918327
## PC21 PC22 PC23 PC24 PC25 PC26
## Eigenvalue 0.373630 0.36585 0.342219 0.310379 0.289510 0.256305
## Proportion Explained 0.005495 0.00538 0.005033 0.004564 0.004258 0.003769
## Cumulative Proportion 0.923822 0.92920 0.934235 0.938799 0.943057 0.946826
## PC27 PC28 PC29 PC30 PC31 PC32
## Eigenvalue 0.24004 0.22850 0.209047 0.185080 0.17139 0.167484
## Proportion Explained 0.00353 0.00336 0.003074 0.002722 0.00252 0.002463
## Cumulative Proportion 0.95036 0.95372 0.956790 0.959512 0.96203 0.964495
## PC33 PC34 PC35 PC36 PC37 PC38
## Eigenvalue 0.154874 0.146368 0.14211 0.134000 0.11967 0.11628
## Proportion Explained 0.002278 0.002152 0.00209 0.001971 0.00176 0.00171
## Cumulative Proportion 0.966773 0.968925 0.97102 0.972986 0.97475 0.97646
## PC39 PC40 PC41 PC42 PC43 PC44
## Eigenvalue 0.115004 0.110966 0.102441 0.094781 0.088919 0.083289
## Proportion Explained 0.001691 0.001632 0.001506 0.001394 0.001308 0.001225
## Cumulative Proportion 0.978147 0.979779 0.981285 0.982679 0.983987 0.985212
## PC45 PC46 PC47 PC48 PC49 PC50
## Eigenvalue 0.081005 0.078022 0.072361 0.068824 0.0614983 0.0596840
## Proportion Explained 0.001191 0.001147 0.001064 0.001012 0.0009044 0.0008777
## Cumulative Proportion 0.986403 0.987550 0.988614 0.989626 0.9905308 0.9914085
## PC51 PC52 PC53 PC54 PC55
## Eigenvalue 0.0576604 0.052293 0.0500817 0.0487816 0.0459844
## Proportion Explained 0.0008479 0.000769 0.0007365 0.0007174 0.0006762
## Cumulative Proportion 0.9922565 0.993025 0.9937620 0.9944793 0.9951556
## PC56 PC57 PC58 PC59 PC60
## Eigenvalue 0.0417325 0.0380500 0.0372162 0.0361970 0.0349061
## Proportion Explained 0.0006137 0.0005596 0.0005473 0.0005323 0.0005133
## Cumulative Proportion 0.9957693 0.9963289 0.9968762 0.9974085 0.9979218
## PC61 PC62 PC63 PC64 PC65
## Eigenvalue 0.0294616 0.0265311 0.0239294 0.0188801 0.0159487
## Proportion Explained 0.0004333 0.0003902 0.0003519 0.0002776 0.0002345
## Cumulative Proportion 0.9983550 0.9987452 0.9990971 0.9993748 0.9996093
## PC66 PC67
## Eigenvalue 0.0152890 0.0112783
## Proportion Explained 0.0002248 0.0001659
## Cumulative Proportion 0.9998341 1.0000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 16.44676
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## DYRK1A_N -0.2962 1.5203 -0.4143 0.08440 -0.58784 0.41917
## ITSN1_N -0.5279 1.5858 -0.2095 0.03611 -0.49695 0.34147
## BDNF_N -1.6652 0.6923 -0.2610 0.02034 0.22156 -0.08032
## NR1_N -1.6258 0.6647 0.3040 0.39382 -0.05124 -0.37477
## NR2A_N -1.4908 0.7660 0.2516 0.74407 0.11890 -0.21675
## pAKT_N -1.2065 -0.8785 -0.5228 -0.79888 -0.46269 -0.04406
## ....
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## sit1 -0.3894055 0.7360 -0.3432 0.44838 0.12295 -0.4314
## sit2 -0.2050927 0.7099 -0.1697 0.32827 0.08360 -0.3154
## sit3 -0.2110797 0.7436 -0.1132 0.26283 0.02917 -0.3066
## sit4 0.0008977 0.5350 -0.4709 0.13833 0.05795 -0.4586
## sit5 0.1353588 0.5387 -0.2524 0.05932 0.03687 -0.3963
## sit6 0.1391488 0.5702 -0.2846 0.07598 -0.05534 -0.4595
## ....
biplot(mice_pca, scaling = "sites", display = "sites")
biplot(mice_pca, scaling = "species", display = "species")
screeplot(mice_pca, type = "lines", bstick = TRUE)
How much percent does each PCA component explain.
pca_summary <- summary(mice_pca)
pca_result <- as.data.frame(pca_summary$cont)
plot_data <- as.data.frame(t(pca_result[c("Proportion Explained"),]))
plot_data$component <- rownames(plot_data)
plot_data <- plot_data %>%
arrange(-`Proportion Explained`) %>%
separate(component, c("imp", "component")) %>%
mutate(component = factor(component, levels=component),
prop_explained = `Proportion Explained` * 100)
# How much percent does each component explain
plot_data[, c("component", "prop_explained")]
## component prop_explained
## importance.PC1 PC1 25.85177170
## importance.PC2 PC2 16.62030357
## importance.PC3 PC3 11.31017464
## importance.PC4 PC4 7.65351090
## importance.PC5 PC5 5.01471596
## importance.PC6 PC6 4.89905059
## importance.PC7 PC7 3.44866632
## importance.PC8 PC8 3.23482944
## importance.PC9 PC9 2.48265355
## importance.PC10 PC10 1.73362830
## importance.PC11 PC11 1.58899585
## importance.PC12 PC12 1.29325888
## importance.PC13 PC13 1.16806244
## importance.PC14 PC14 1.00214124
## importance.PC15 PC15 0.92295583
## importance.PC16 PC16 0.87516157
## importance.PC17 PC17 0.78899804
## importance.PC18 PC18 0.70829939
## importance.PC19 PC19 0.62577235
## importance.PC20 PC20 0.60978235
## importance.PC21 PC21 0.54945638
## importance.PC22 PC22 0.53800821
## importance.PC23 PC23 0.50326284
## importance.PC24 PC24 0.45643946
## importance.PC25 PC25 0.42575025
## importance.PC26 PC26 0.37691852
## importance.PC27 PC27 0.35300378
## importance.PC28 PC28 0.33602433
## importance.PC29 PC29 0.30742146
## importance.PC30 PC30 0.27217649
## importance.PC31 PC31 0.25203928
## importance.PC32 PC32 0.24629937
## importance.PC33 PC33 0.22775642
## importance.PC34 PC34 0.21524745
## importance.PC35 PC35 0.20898496
## importance.PC36 PC36 0.19705951
## importance.PC37 PC37 0.17598911
## importance.PC38 PC38 0.17099336
## importance.PC39 PC39 0.16912308
## importance.PC40 PC40 0.16318534
## importance.PC41 PC41 0.15064921
## importance.PC42 PC42 0.13938309
## importance.PC43 PC43 0.13076360
## importance.PC44 PC44 0.12248386
## importance.PC45 PC45 0.11912567
## importance.PC46 PC46 0.11473778
## importance.PC47 PC47 0.10641366
## importance.PC48 PC48 0.10121210
## importance.PC49 PC49 0.09043872
## importance.PC50 PC50 0.08777064
## importance.PC51 PC51 0.08479476
## importance.PC52 PC52 0.07690171
## importance.PC53 PC53 0.07364961
## importance.PC54 PC54 0.07173768
## importance.PC55 PC55 0.06762415
## importance.PC56 PC56 0.06137127
## importance.PC57 PC57 0.05595590
## importance.PC58 PC58 0.05472967
## importance.PC59 PC59 0.05323086
## importance.PC60 PC60 0.05133257
## importance.PC61 PC61 0.04332587
## importance.PC62 PC62 0.03901626
## importance.PC63 PC63 0.03519034
## importance.PC64 PC64 0.02776489
## importance.PC65 PC65 0.02345397
## importance.PC66 PC66 0.02248387
## importance.PC67 PC67 0.01658575
# Plotting first ten
ggplot(plot_data[1:10, ], aes(component, prop_explained)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=5),
panel.background = element_rect(fill = "white", colour = "grey50")) +
xlab("Component") +
ylab("Proportion explained [%]")
Cumulative proportion
plot_data <- plot_data %>%
mutate(cum_prop = cumsum(prop_explained))
# Plotting till explained cumulative proportion > 90 %
ggplot(plot_data[1:which(plot_data$cum_prop > 90)[1], ], aes(component, cum_prop)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=5),
panel.background = element_rect(fill = "white", colour = "grey50")) +
xlab("Component") +
ylab("Proportion explained [%]")
First 18 PCA components needed to explain more than 90 % of data.
prin_comp <- prcomp(less_na_num_mice_df, rank. = 3)
components <- prin_comp[["x"]]
components <- data.frame(components)
components$PC2 <- -components$PC2
components$PC3 <- -components$PC3
components = cbind(components, less_na_mice_df$class)
fig <- plot_ly(components, x = ~PC1, y = ~PC2, z = ~PC3, color = ~less_na_mice_df$class, colors = c('#636EFA','#EF553B','#00CC96') ) %>%
add_markers(size = 12)
fig <- fig %>%
layout(
scene = list(bgcolor = "#e5ecf6")
)
fig
Building the
model <- pcr(ERBB4_N ~ ., data = final_training_data, scale=TRUE, validation="CV")
summary(model)
## Data: X dimension: 856 58
## Y dimension: 856 1
## Fit method: svdpc
## Number of components considered: 58
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 0.01538 0.01439 0.01445 0.01028 0.01026 0.009547 0.009526
## adjCV 0.01538 0.01438 0.01445 0.01027 0.01025 0.009529 0.009517
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 0.009540 0.008953 0.008677 0.008609 0.008601 0.008178 0.008068
## adjCV 0.009533 0.008946 0.008665 0.008601 0.008597 0.008130 0.008059
## 14 comps 15 comps 16 comps 17 comps 18 comps 19 comps 20 comps
## CV 0.008066 0.007870 0.007878 0.007896 0.007859 0.007856 0.007786
## adjCV 0.008096 0.007855 0.007866 0.007884 0.007843 0.007854 0.007783
## 21 comps 22 comps 23 comps 24 comps 25 comps 26 comps 27 comps
## CV 0.007835 0.007751 0.007752 0.007726 0.007564 0.007575 0.007583
## adjCV 0.007833 0.007732 0.007742 0.007721 0.007547 0.007561 0.007575
## 28 comps 29 comps 30 comps 31 comps 32 comps 33 comps 34 comps
## CV 0.007554 0.007392 0.007404 0.007419 0.007425 0.007447 0.007420
## adjCV 0.007588 0.007365 0.007382 0.007397 0.007407 0.007430 0.007398
## 35 comps 36 comps 37 comps 38 comps 39 comps 40 comps 41 comps
## CV 0.007407 0.007408 0.007438 0.007451 0.007456 0.007421 0.007461
## adjCV 0.007382 0.007384 0.007414 0.007427 0.007432 0.007391 0.007434
## 42 comps 43 comps 44 comps 45 comps 46 comps 47 comps 48 comps
## CV 0.007464 0.007447 0.007431 0.007423 0.007429 0.007415 0.007473
## adjCV 0.007438 0.007430 0.007406 0.007395 0.007402 0.007380 0.007439
## 49 comps 50 comps 51 comps 52 comps 53 comps 54 comps 55 comps
## CV 0.007434 0.007426 0.007432 0.007419 0.007449 0.007497 0.007514
## adjCV 0.007403 0.007402 0.007394 0.007381 0.007411 0.007458 0.007472
## 56 comps 57 comps 58 comps
## CV 0.007532 0.007543 0.007553
## adjCV 0.007490 0.007500 0.007509
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 26.56 41.90 52.55 60.53 66.15 71.25 75.19 77.99
## ERBB4_N 13.42 13.42 56.04 56.28 62.29 62.54 62.58 66.99
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 80.14 81.98 83.56 84.97 86.31 87.36 88.37
## ERBB4_N 69.10 69.54 69.63 72.49 73.38 73.39 74.80
## 16 comps 17 comps 18 comps 19 comps 20 comps 21 comps 22 comps
## X 89.29 90.17 90.91 91.58 92.23 92.83 93.4
## ERBB4_N 74.80 74.80 75.01 75.01 75.60 75.67 76.3
## 23 comps 24 comps 25 comps 26 comps 27 comps 28 comps 29 comps
## X 93.93 94.39 94.81 95.20 95.55 95.88 96.20
## ERBB4_N 76.30 76.40 77.39 77.39 77.48 77.50 78.86
## 30 comps 31 comps 32 comps 33 comps 34 comps 35 comps 36 comps
## X 96.49 96.76 97.01 97.25 97.47 97.67 97.85
## ERBB4_N 78.88 78.89 78.91 78.93 79.19 79.29 79.31
## 37 comps 38 comps 39 comps 40 comps 41 comps 42 comps 43 comps
## X 98.03 98.19 98.34 98.49 98.63 98.76 98.88
## ERBB4_N 79.31 79.33 79.36 79.53 79.53 79.54 79.58
## 44 comps 45 comps 46 comps 47 comps 48 comps 49 comps 50 comps
## X 98.99 99.10 99.20 99.29 99.38 99.46 99.54
## ERBB4_N 79.74 79.82 79.84 80.01 80.08 80.24 80.30
## 51 comps 52 comps 53 comps 54 comps 55 comps 56 comps 57 comps
## X 99.61 99.68 99.75 99.82 99.87 99.92 99.96
## ERBB4_N 80.49 80.55 80.55 80.56 80.58 80.59 80.60
## 58 comps
## X 100.00
## ERBB4_N 80.64
First 3 components have very low cumulative explained variation
validationplot(model, val.type="RMSEP")
validationplot(model, val.type="MSEP")
validationplot(model, val.type="R2")
As we can see, after adding around 13 components the model does not improve that much anymore. Let’s to prediction:
y_pred <- predict(model, filt_norm_test_data, ncomp=13)
actuals_preds <- data.frame(cbind(actuals=test_data$ERBB4_N, predicteds=y_pred))
plot(model)
# Very low correlation between the predicted and the actual data
correlation_accuracy <- cor(actuals_preds)
correlation_accuracy
## actuals predicteds
## actuals 1.0000000 0.8665048
## predicteds 0.8665048 1.0000000
# MAE and MSE in comparison to data
mae <- mae(final_training_data$ERBB4_N, predict(model))
mse <- mse(final_training_data$ERBB4_N, predict(model))
Correlation between the predicted and the actual data - 0.8665048. MAE is equal to 0.0058996 and MSE to 6.1875897^{-5}. The model is worse than the linear model from before.
Using library limma version 3.52.0.
diff_exp <- less_na_mice_df
str(diff_exp)
## tibble [1,077 × 73] (S3: tbl_df/tbl/data.frame)
## $ MouseID : chr [1:1077] "309_1" "309_2" "309_3" "309_4" ...
## $ DYRK1A_N : num [1:1077] 0.504 0.515 0.509 0.442 0.435 ...
## $ ITSN1_N : num [1:1077] 0.747 0.689 0.73 0.617 0.617 ...
## $ BDNF_N : num [1:1077] 0.43 0.412 0.418 0.359 0.359 ...
## $ NR1_N : num [1:1077] 2.82 2.79 2.69 2.47 2.37 ...
## $ NR2A_N : num [1:1077] 5.99 5.69 5.62 4.98 4.72 ...
## $ pAKT_N : num [1:1077] 0.219 0.212 0.209 0.223 0.213 ...
## $ pBRAF_N : num [1:1077] 0.178 0.173 0.176 0.176 0.174 ...
## $ pCAMKII_N : num [1:1077] 2.37 2.29 2.28 2.15 2.13 ...
## $ pCREB_N : num [1:1077] 0.232 0.227 0.23 0.207 0.192 ...
## $ pELK_N : num [1:1077] 1.75 1.6 1.56 1.6 1.5 ...
## $ pERK_N : num [1:1077] 0.688 0.695 0.677 0.583 0.551 ...
## $ pJNK_N : num [1:1077] 0.306 0.299 0.291 0.297 0.287 ...
## $ PKCA_N : num [1:1077] 0.403 0.386 0.381 0.377 0.364 ...
## $ pMEK_N : num [1:1077] 0.297 0.281 0.282 0.314 0.278 ...
## $ pNR1_N : num [1:1077] 1.022 0.957 1.004 0.875 0.865 ...
## $ pNR2A_N : num [1:1077] 0.606 0.588 0.602 0.52 0.508 ...
## $ pNR2B_N : num [1:1077] 1.88 1.73 1.73 1.57 1.48 ...
## $ pPKCAB_N : num [1:1077] 2.31 2.04 2.02 2.13 2.01 ...
## $ pRSK_N : num [1:1077] 0.442 0.445 0.468 0.478 0.483 ...
## $ AKT_N : num [1:1077] 0.859 0.835 0.814 0.728 0.688 ...
## $ BRAF_N : num [1:1077] 0.416 0.4 0.4 0.386 0.368 ...
## $ CAMKII_N : num [1:1077] 0.37 0.356 0.368 0.363 0.355 ...
## $ CREB_N : num [1:1077] 0.179 0.174 0.174 0.179 0.175 ...
## $ ERK_N : num [1:1077] 3.69 3.49 3.57 2.97 2.9 ...
## $ GSK3B_N : num [1:1077] 1.54 1.51 1.5 1.42 1.36 ...
## $ JNK_N : num [1:1077] 0.265 0.256 0.26 0.26 0.251 ...
## $ TRKA_N : num [1:1077] 0.814 0.781 0.785 0.734 0.703 ...
## $ RSK_N : num [1:1077] 0.166 0.157 0.161 0.162 0.155 ...
## $ APP_N : num [1:1077] 0.454 0.431 0.423 0.411 0.399 ...
## $ SOD1_N : num [1:1077] 0.37 0.342 0.344 0.345 0.329 ...
## $ MTOR_N : num [1:1077] 0.459 0.424 0.425 0.429 0.409 ...
## $ P38_N : num [1:1077] 0.335 0.325 0.325 0.33 0.313 ...
## $ pMTOR_N : num [1:1077] 0.825 0.762 0.757 0.747 0.692 ...
## $ DSCR1_N : num [1:1077] 0.577 0.545 0.544 0.547 0.537 ...
## $ AMPKA_N : num [1:1077] 0.448 0.421 0.405 0.387 0.361 ...
## $ NR2B_N : num [1:1077] 0.586 0.545 0.553 0.548 0.513 ...
## $ pNUMB_N : num [1:1077] 0.395 0.368 0.364 0.367 0.352 ...
## $ RAPTOR_N : num [1:1077] 0.34 0.322 0.313 0.328 0.312 ...
## $ TIAM1_N : num [1:1077] 0.483 0.455 0.447 0.443 0.419 ...
## $ pP70S6_N : num [1:1077] 0.294 0.276 0.257 0.399 0.393 ...
## $ NUMB_N : num [1:1077] 0.182 0.182 0.184 0.162 0.16 ...
## $ P70S6_N : num [1:1077] 0.843 0.848 0.856 0.76 0.768 ...
## $ pGSK3B_N : num [1:1077] 0.193 0.195 0.201 0.184 0.186 ...
## $ pPKCG_N : num [1:1077] 1.44 1.44 1.52 1.61 1.65 ...
## $ CDK5_N : num [1:1077] 0.295 0.294 0.302 0.296 0.297 ...
## $ S6_N : num [1:1077] 0.355 0.355 0.386 0.291 0.309 ...
## $ ADARB1_N : num [1:1077] 1.34 1.31 1.28 1.2 1.21 ...
## $ AcetylH3K9_N : num [1:1077] 0.17 0.171 0.185 0.16 0.165 ...
## $ RRP1_N : num [1:1077] 0.159 0.158 0.149 0.166 0.161 ...
## $ BAX_N : num [1:1077] 0.189 0.185 0.191 0.185 0.188 ...
## $ ARC_N : num [1:1077] 0.106 0.107 0.108 0.103 0.105 ...
## $ ERBB4_N : num [1:1077] 0.145 0.15 0.145 0.141 0.142 ...
## $ nNOS_N : num [1:1077] 0.177 0.178 0.176 0.164 0.168 ...
## $ Tau_N : num [1:1077] 0.125 0.134 0.133 0.123 0.137 ...
## $ GFAP_N : num [1:1077] 0.115 0.118 0.118 0.117 0.116 ...
## $ GluR3_N : num [1:1077] 0.228 0.238 0.245 0.235 0.256 ...
## $ GluR4_N : num [1:1077] 0.143 0.142 0.142 0.145 0.141 ...
## $ IL1B_N : num [1:1077] 0.431 0.457 0.51 0.431 0.481 ...
## $ P3525_N : num [1:1077] 0.248 0.258 0.255 0.251 0.252 ...
## $ pCASP9_N : num [1:1077] 1.6 1.67 1.66 1.48 1.53 ...
## $ PSD95_N : num [1:1077] 2.01 2 2.02 1.96 2.01 ...
## $ SNCA_N : num [1:1077] 0.108 0.11 0.108 0.12 0.12 ...
## $ Ubiquitin_N : num [1:1077] 1.045 1.01 0.997 0.99 0.998 ...
## $ pGSK3B_Tyr216_N: num [1:1077] 0.832 0.849 0.847 0.833 0.879 ...
## $ SHH_N : num [1:1077] 0.189 0.2 0.194 0.192 0.206 ...
## $ pS6_N : num [1:1077] 0.106 0.107 0.108 0.103 0.105 ...
## $ SYP_N : num [1:1077] 0.427 0.442 0.436 0.392 0.434 ...
## $ CaNA_N : num [1:1077] 1.68 1.74 1.93 1.7 1.84 ...
## $ Genotype : chr [1:1077] "Control" "Control" "Control" "Control" ...
## $ Treatment : chr [1:1077] "Memantine" "Memantine" "Memantine" "Memantine" ...
## $ Behavior : chr [1:1077] "C/S" "C/S" "C/S" "C/S" ...
## $ class : Factor w/ 8 levels "c-CS-m","c-CS-s",..: 1 1 1 1 1 1 1 1 1 1 ...
diff_exp$Behavior <- as.factor(diff_exp$Behavior)
diff_exp$Treatment <- as.factor(diff_exp$Treatment)
diff_exp$Genotype <- as.factor(diff_exp$Genotype)
diff_exp$class <- as.factor(diff_exp$class)
str(diff_exp)
## tibble [1,077 × 73] (S3: tbl_df/tbl/data.frame)
## $ MouseID : chr [1:1077] "309_1" "309_2" "309_3" "309_4" ...
## $ DYRK1A_N : num [1:1077] 0.504 0.515 0.509 0.442 0.435 ...
## $ ITSN1_N : num [1:1077] 0.747 0.689 0.73 0.617 0.617 ...
## $ BDNF_N : num [1:1077] 0.43 0.412 0.418 0.359 0.359 ...
## $ NR1_N : num [1:1077] 2.82 2.79 2.69 2.47 2.37 ...
## $ NR2A_N : num [1:1077] 5.99 5.69 5.62 4.98 4.72 ...
## $ pAKT_N : num [1:1077] 0.219 0.212 0.209 0.223 0.213 ...
## $ pBRAF_N : num [1:1077] 0.178 0.173 0.176 0.176 0.174 ...
## $ pCAMKII_N : num [1:1077] 2.37 2.29 2.28 2.15 2.13 ...
## $ pCREB_N : num [1:1077] 0.232 0.227 0.23 0.207 0.192 ...
## $ pELK_N : num [1:1077] 1.75 1.6 1.56 1.6 1.5 ...
## $ pERK_N : num [1:1077] 0.688 0.695 0.677 0.583 0.551 ...
## $ pJNK_N : num [1:1077] 0.306 0.299 0.291 0.297 0.287 ...
## $ PKCA_N : num [1:1077] 0.403 0.386 0.381 0.377 0.364 ...
## $ pMEK_N : num [1:1077] 0.297 0.281 0.282 0.314 0.278 ...
## $ pNR1_N : num [1:1077] 1.022 0.957 1.004 0.875 0.865 ...
## $ pNR2A_N : num [1:1077] 0.606 0.588 0.602 0.52 0.508 ...
## $ pNR2B_N : num [1:1077] 1.88 1.73 1.73 1.57 1.48 ...
## $ pPKCAB_N : num [1:1077] 2.31 2.04 2.02 2.13 2.01 ...
## $ pRSK_N : num [1:1077] 0.442 0.445 0.468 0.478 0.483 ...
## $ AKT_N : num [1:1077] 0.859 0.835 0.814 0.728 0.688 ...
## $ BRAF_N : num [1:1077] 0.416 0.4 0.4 0.386 0.368 ...
## $ CAMKII_N : num [1:1077] 0.37 0.356 0.368 0.363 0.355 ...
## $ CREB_N : num [1:1077] 0.179 0.174 0.174 0.179 0.175 ...
## $ ERK_N : num [1:1077] 3.69 3.49 3.57 2.97 2.9 ...
## $ GSK3B_N : num [1:1077] 1.54 1.51 1.5 1.42 1.36 ...
## $ JNK_N : num [1:1077] 0.265 0.256 0.26 0.26 0.251 ...
## $ TRKA_N : num [1:1077] 0.814 0.781 0.785 0.734 0.703 ...
## $ RSK_N : num [1:1077] 0.166 0.157 0.161 0.162 0.155 ...
## $ APP_N : num [1:1077] 0.454 0.431 0.423 0.411 0.399 ...
## $ SOD1_N : num [1:1077] 0.37 0.342 0.344 0.345 0.329 ...
## $ MTOR_N : num [1:1077] 0.459 0.424 0.425 0.429 0.409 ...
## $ P38_N : num [1:1077] 0.335 0.325 0.325 0.33 0.313 ...
## $ pMTOR_N : num [1:1077] 0.825 0.762 0.757 0.747 0.692 ...
## $ DSCR1_N : num [1:1077] 0.577 0.545 0.544 0.547 0.537 ...
## $ AMPKA_N : num [1:1077] 0.448 0.421 0.405 0.387 0.361 ...
## $ NR2B_N : num [1:1077] 0.586 0.545 0.553 0.548 0.513 ...
## $ pNUMB_N : num [1:1077] 0.395 0.368 0.364 0.367 0.352 ...
## $ RAPTOR_N : num [1:1077] 0.34 0.322 0.313 0.328 0.312 ...
## $ TIAM1_N : num [1:1077] 0.483 0.455 0.447 0.443 0.419 ...
## $ pP70S6_N : num [1:1077] 0.294 0.276 0.257 0.399 0.393 ...
## $ NUMB_N : num [1:1077] 0.182 0.182 0.184 0.162 0.16 ...
## $ P70S6_N : num [1:1077] 0.843 0.848 0.856 0.76 0.768 ...
## $ pGSK3B_N : num [1:1077] 0.193 0.195 0.201 0.184 0.186 ...
## $ pPKCG_N : num [1:1077] 1.44 1.44 1.52 1.61 1.65 ...
## $ CDK5_N : num [1:1077] 0.295 0.294 0.302 0.296 0.297 ...
## $ S6_N : num [1:1077] 0.355 0.355 0.386 0.291 0.309 ...
## $ ADARB1_N : num [1:1077] 1.34 1.31 1.28 1.2 1.21 ...
## $ AcetylH3K9_N : num [1:1077] 0.17 0.171 0.185 0.16 0.165 ...
## $ RRP1_N : num [1:1077] 0.159 0.158 0.149 0.166 0.161 ...
## $ BAX_N : num [1:1077] 0.189 0.185 0.191 0.185 0.188 ...
## $ ARC_N : num [1:1077] 0.106 0.107 0.108 0.103 0.105 ...
## $ ERBB4_N : num [1:1077] 0.145 0.15 0.145 0.141 0.142 ...
## $ nNOS_N : num [1:1077] 0.177 0.178 0.176 0.164 0.168 ...
## $ Tau_N : num [1:1077] 0.125 0.134 0.133 0.123 0.137 ...
## $ GFAP_N : num [1:1077] 0.115 0.118 0.118 0.117 0.116 ...
## $ GluR3_N : num [1:1077] 0.228 0.238 0.245 0.235 0.256 ...
## $ GluR4_N : num [1:1077] 0.143 0.142 0.142 0.145 0.141 ...
## $ IL1B_N : num [1:1077] 0.431 0.457 0.51 0.431 0.481 ...
## $ P3525_N : num [1:1077] 0.248 0.258 0.255 0.251 0.252 ...
## $ pCASP9_N : num [1:1077] 1.6 1.67 1.66 1.48 1.53 ...
## $ PSD95_N : num [1:1077] 2.01 2 2.02 1.96 2.01 ...
## $ SNCA_N : num [1:1077] 0.108 0.11 0.108 0.12 0.12 ...
## $ Ubiquitin_N : num [1:1077] 1.045 1.01 0.997 0.99 0.998 ...
## $ pGSK3B_Tyr216_N: num [1:1077] 0.832 0.849 0.847 0.833 0.879 ...
## $ SHH_N : num [1:1077] 0.189 0.2 0.194 0.192 0.206 ...
## $ pS6_N : num [1:1077] 0.106 0.107 0.108 0.103 0.105 ...
## $ SYP_N : num [1:1077] 0.427 0.442 0.436 0.392 0.434 ...
## $ CaNA_N : num [1:1077] 1.68 1.74 1.93 1.7 1.84 ...
## $ Genotype : Factor w/ 2 levels "Control","Ts65Dn": 1 1 1 1 1 1 1 1 1 1 ...
## $ Treatment : Factor w/ 2 levels "Memantine","Saline": 1 1 1 1 1 1 1 1 1 1 ...
## $ Behavior : Factor w/ 2 levels "C/S","S/C": 1 1 1 1 1 1 1 1 1 1 ...
## $ class : Factor w/ 8 levels "c-CS-m","c-CS-s",..: 1 1 1 1 1 1 1 1 1 1 ...
diff_exp_num <- data.frame(t(select_if(diff_exp, is.numeric)))
str(diff_exp_num)
## 'data.frame': 68 obs. of 1077 variables:
## $ X1 : num 0.504 0.747 0.43 2.816 5.99 ...
## $ X2 : num 0.515 0.689 0.412 2.79 5.685 ...
## $ X3 : num 0.509 0.73 0.418 2.687 5.622 ...
## $ X4 : num 0.442 0.617 0.359 2.467 4.98 ...
## $ X5 : num 0.435 0.617 0.359 2.366 4.719 ...
## $ X6 : num 0.448 0.628 0.367 2.386 4.808 ...
## $ X7 : num 0.428 0.574 0.343 2.334 4.473 ...
## $ X8 : num 0.417 0.564 0.328 2.26 4.269 ...
## $ X9 : num 0.386 0.538 0.318 2.126 4.064 ...
## $ X10 : num 0.381 0.499 0.362 2.096 3.599 ...
## $ X11 : num 0.367 0.513 0.328 2.073 3.661 ...
## $ X12 : num 0.364 0.499 0.355 2.007 3.467 ...
## $ X13 : num 0.365 0.482 0.313 1.946 3.35 ...
## $ X14 : num 0.382 0.486 0.311 1.959 3.349 ...
## $ X15 : num 0.374 0.462 0.345 1.861 3.287 ...
## $ X16 : num 0.743 0.863 0.378 2.736 6.068 ...
## $ X17 : num 0.711 0.807 0.352 2.547 5.596 ...
## $ X18 : num 0.705 0.803 0.35 2.468 5.548 ...
## $ X19 : num 0.677 0.77 0.356 2.563 4.975 ...
## $ X20 : num 0.592 0.679 0.312 2.164 4.314 ...
## $ X21 : num 0.619 0.717 0.32 2.286 4.571 ...
## $ X22 : num 0.703 0.7 0.388 2.438 4.449 ...
## $ X23 : num 0.599 0.69 0.35 2.308 4.229 ...
## $ X24 : num 0.562 0.642 0.308 2.158 4.021 ...
## $ X25 : num 0.551 0.561 0.321 2.198 3.559 ...
## $ X26 : num 0.538 0.702 0.384 2.482 4.11 ...
## $ X27 : num 0.521 0.583 0.304 2.053 3.381 ...
## $ X28 : num 0.488 0.54 0.291 1.913 2.874 ...
## $ X29 : num 0.515 0.565 0.316 1.957 2.979 ...
## $ X30 : num 0.485 0.556 0.287 1.892 2.847 ...
## $ X31 : num 0.628 0.954 0.447 2.931 5.915 ...
## $ X32 : num 0.651 0.962 0.465 2.993 5.975 ...
## $ X33 : num 0.644 0.967 0.47 3.074 5.927 ...
## $ X34 : num 0.568 0.812 0.393 2.607 5.808 ...
## $ X35 : num 0.587 0.864 0.411 2.758 6.007 ...
## $ X36 : num 0.596 0.862 0.421 2.831 6.368 ...
## $ X37 : num 0.523 0.721 0.364 2.456 5.192 ...
## $ X38 : num 0.531 0.748 0.386 2.573 5.495 ...
## $ X39 : num 0.535 0.774 0.392 2.668 5.988 ...
## $ X40 : num 0.519 0.736 0.371 2.503 4.947 ...
## $ X41 : num 0.509 0.736 0.384 2.545 5.101 ...
## $ X42 : num 0.517 0.737 0.384 2.625 5.601 ...
## $ X43 : num 0.454 0.685 0.36 2.393 4.5 ...
## $ X44 : num 0.472 0.697 0.372 2.473 4.652 ...
## $ X45 : num 0.481 0.684 0.361 2.539 5.008 ...
## $ X46 : num 0.561 0.763 0.403 2.934 6.317 ...
## $ X47 : num 0.516 0.75 0.401 2.919 6.307 ...
## $ X48 : num 0.507 0.77 0.418 2.926 6.766 ...
## $ X49 : num 0.505 0.696 0.376 2.916 5.918 ...
## $ X50 : num 0.479 0.689 0.381 2.914 6.067 ...
## $ X51 : num 0.458 0.703 0.386 2.904 6.318 ...
## $ X52 : num 0.421 0.623 0.343 2.635 5.308 ...
## $ X53 : num 0.437 0.64 0.358 2.732 5.469 ...
## $ X54 : num 0.404 0.641 0.362 2.733 5.787 ...
## $ X55 : num 0.383 0.584 0.325 2.468 4.321 ...
## $ X56 : num 0.383 0.597 0.339 2.488 4.261 ...
## $ X57 : num 0.396 0.577 0.334 2.485 4.482 ...
## $ X58 : num 0.378 0.591 0.308 2.372 4.051 ...
## $ X59 : num 0.367 0.574 0.326 2.442 4.036 ...
## $ X60 : num 0.386 0.583 0.332 2.43 4.066 ...
## $ X61 : num 0.378 0.556 0.315 2.221 4.887 ...
## $ X62 : num 0.391 0.583 0.333 2.377 5.211 ...
## $ X63 : num 0.395 0.576 0.343 2.356 5.326 ...
## $ X64 : num 0.385 0.541 0.322 2.178 4.799 ...
## $ X65 : num 0.357 0.524 0.317 2.178 4.583 ...
## $ X66 : num 0.363 0.555 0.324 2.253 4.806 ...
## $ X67 : num 0.36 0.501 0.303 2.07 4.102 ...
## $ X68 : num 0.38 0.505 0.303 2.063 4.128 ...
## $ X69 : num 0.329 0.493 0.296 2.051 4.151 ...
## $ X70 : num 0.347 0.474 0.296 1.983 3.674 ...
## $ X71 : num 0.339 0.478 0.301 1.955 3.62 ...
## $ X72 : num 0.341 0.481 0.311 1.996 3.789 ...
## $ X73 : num 0.344 0.47 0.323 1.967 3.438 ...
## $ X74 : num 0.335 0.474 0.319 2.019 3.457 ...
## $ X75 : num 0.341 0.472 0.321 2.011 3.5 ...
## $ X76 : num 0.65 0.829 0.406 2.921 5.168 ...
## $ X77 : num 0.616 0.842 0.389 2.863 5.194 ...
## $ X78 : num 0.637 0.853 0.401 2.968 5.351 ...
## $ X79 : num 0.577 0.755 0.348 2.625 4.728 ...
## $ X80 : num 0.543 0.758 0.35 2.635 4.736 ...
## $ X81 : num 0.57 0.761 0.344 2.598 4.765 ...
## $ X82 : num 0.494 0.666 0.312 2.44 4.257 ...
## $ X83 : num 0.486 0.63 0.3 2.318 4.15 ...
## $ X84 : num 0.509 0.655 0.306 2.416 4.339 ...
## $ X85 : num 0.408 0.544 0.266 2.03 3.387 ...
## $ X86 : num 0.418 0.548 0.262 2 3.46 ...
## $ X87 : num 0.429 0.563 0.258 2.028 3.543 ...
## $ X88 : num 0.368 0.504 0.261 1.893 3.062 ...
## $ X89 : num 0.379 0.466 0.261 1.901 3.016 ...
## $ X90 : num 0.374 0.471 0.258 1.86 2.939 ...
## $ X91 : num 0.452 0.641 0.35 2.729 4.191 ...
## $ X92 : num 0.458 0.672 0.355 2.716 4.379 ...
## $ X93 : num 0.451 0.643 0.362 2.715 4.443 ...
## $ X94 : num 0.356 0.518 0.302 2.385 3.706 ...
## $ X95 : num 0.353 0.55 0.308 2.427 3.817 ...
## $ X96 : num 0.357 0.55 0.307 2.429 3.844 ...
## $ X97 : num 0.288 0.454 0.266 2.009 3.162 ...
## $ X98 : num 0.287 0.458 0.264 2.004 3.104 ...
## $ X99 : num 0.297 0.475 0.268 2.095 3.307 ...
## [list output truncated]
diff_exp_class <- rbind(diff_exp_num, class=diff_exp$class)
str(diff_exp_class)
## 'data.frame': 69 obs. of 1077 variables:
## $ X1 : num 0.504 0.747 0.43 2.816 5.99 ...
## $ X2 : num 0.515 0.689 0.412 2.79 5.685 ...
## $ X3 : num 0.509 0.73 0.418 2.687 5.622 ...
## $ X4 : num 0.442 0.617 0.359 2.467 4.98 ...
## $ X5 : num 0.435 0.617 0.359 2.366 4.719 ...
## $ X6 : num 0.448 0.628 0.367 2.386 4.808 ...
## $ X7 : num 0.428 0.574 0.343 2.334 4.473 ...
## $ X8 : num 0.417 0.564 0.328 2.26 4.269 ...
## $ X9 : num 0.386 0.538 0.318 2.126 4.064 ...
## $ X10 : num 0.381 0.499 0.362 2.096 3.599 ...
## $ X11 : num 0.367 0.513 0.328 2.073 3.661 ...
## $ X12 : num 0.364 0.499 0.355 2.007 3.467 ...
## $ X13 : num 0.365 0.482 0.313 1.946 3.35 ...
## $ X14 : num 0.382 0.486 0.311 1.959 3.349 ...
## $ X15 : num 0.374 0.462 0.345 1.861 3.287 ...
## $ X16 : num 0.743 0.863 0.378 2.736 6.068 ...
## $ X17 : num 0.711 0.807 0.352 2.547 5.596 ...
## $ X18 : num 0.705 0.803 0.35 2.468 5.548 ...
## $ X19 : num 0.677 0.77 0.356 2.563 4.975 ...
## $ X20 : num 0.592 0.679 0.312 2.164 4.314 ...
## $ X21 : num 0.619 0.717 0.32 2.286 4.571 ...
## $ X22 : num 0.703 0.7 0.388 2.438 4.449 ...
## $ X23 : num 0.599 0.69 0.35 2.308 4.229 ...
## $ X24 : num 0.562 0.642 0.308 2.158 4.021 ...
## $ X25 : num 0.551 0.561 0.321 2.198 3.559 ...
## $ X26 : num 0.538 0.702 0.384 2.482 4.11 ...
## $ X27 : num 0.521 0.583 0.304 2.053 3.381 ...
## $ X28 : num 0.488 0.54 0.291 1.913 2.874 ...
## $ X29 : num 0.515 0.565 0.316 1.957 2.979 ...
## $ X30 : num 0.485 0.556 0.287 1.892 2.847 ...
## $ X31 : num 0.628 0.954 0.447 2.931 5.915 ...
## $ X32 : num 0.651 0.962 0.465 2.993 5.975 ...
## $ X33 : num 0.644 0.967 0.47 3.074 5.927 ...
## $ X34 : num 0.568 0.812 0.393 2.607 5.808 ...
## $ X35 : num 0.587 0.864 0.411 2.758 6.007 ...
## $ X36 : num 0.596 0.862 0.421 2.831 6.368 ...
## $ X37 : num 0.523 0.721 0.364 2.456 5.192 ...
## $ X38 : num 0.531 0.748 0.386 2.573 5.495 ...
## $ X39 : num 0.535 0.774 0.392 2.668 5.988 ...
## $ X40 : num 0.519 0.736 0.371 2.503 4.947 ...
## $ X41 : num 0.509 0.736 0.384 2.545 5.101 ...
## $ X42 : num 0.517 0.737 0.384 2.625 5.601 ...
## $ X43 : num 0.454 0.685 0.36 2.393 4.5 ...
## $ X44 : num 0.472 0.697 0.372 2.473 4.652 ...
## $ X45 : num 0.481 0.684 0.361 2.539 5.008 ...
## $ X46 : num 0.561 0.763 0.403 2.934 6.317 ...
## $ X47 : num 0.516 0.75 0.401 2.919 6.307 ...
## $ X48 : num 0.507 0.77 0.418 2.926 6.766 ...
## $ X49 : num 0.505 0.696 0.376 2.916 5.918 ...
## $ X50 : num 0.479 0.689 0.381 2.914 6.067 ...
## $ X51 : num 0.458 0.703 0.386 2.904 6.318 ...
## $ X52 : num 0.421 0.623 0.343 2.635 5.308 ...
## $ X53 : num 0.437 0.64 0.358 2.732 5.469 ...
## $ X54 : num 0.404 0.641 0.362 2.733 5.787 ...
## $ X55 : num 0.383 0.584 0.325 2.468 4.321 ...
## $ X56 : num 0.383 0.597 0.339 2.488 4.261 ...
## $ X57 : num 0.396 0.577 0.334 2.485 4.482 ...
## $ X58 : num 0.378 0.591 0.308 2.372 4.051 ...
## $ X59 : num 0.367 0.574 0.326 2.442 4.036 ...
## $ X60 : num 0.386 0.583 0.332 2.43 4.066 ...
## $ X61 : num 0.378 0.556 0.315 2.221 4.887 ...
## $ X62 : num 0.391 0.583 0.333 2.377 5.211 ...
## $ X63 : num 0.395 0.576 0.343 2.356 5.326 ...
## $ X64 : num 0.385 0.541 0.322 2.178 4.799 ...
## $ X65 : num 0.357 0.524 0.317 2.178 4.583 ...
## $ X66 : num 0.363 0.555 0.324 2.253 4.806 ...
## $ X67 : num 0.36 0.501 0.303 2.07 4.102 ...
## $ X68 : num 0.38 0.505 0.303 2.063 4.128 ...
## $ X69 : num 0.329 0.493 0.296 2.051 4.151 ...
## $ X70 : num 0.347 0.474 0.296 1.983 3.674 ...
## $ X71 : num 0.339 0.478 0.301 1.955 3.62 ...
## $ X72 : num 0.341 0.481 0.311 1.996 3.789 ...
## $ X73 : num 0.344 0.47 0.323 1.967 3.438 ...
## $ X74 : num 0.335 0.474 0.319 2.019 3.457 ...
## $ X75 : num 0.341 0.472 0.321 2.011 3.5 ...
## $ X76 : num 0.65 0.829 0.406 2.921 5.168 ...
## $ X77 : num 0.616 0.842 0.389 2.863 5.194 ...
## $ X78 : num 0.637 0.853 0.401 2.968 5.351 ...
## $ X79 : num 0.577 0.755 0.348 2.625 4.728 ...
## $ X80 : num 0.543 0.758 0.35 2.635 4.736 ...
## $ X81 : num 0.57 0.761 0.344 2.598 4.765 ...
## $ X82 : num 0.494 0.666 0.312 2.44 4.257 ...
## $ X83 : num 0.486 0.63 0.3 2.318 4.15 ...
## $ X84 : num 0.509 0.655 0.306 2.416 4.339 ...
## $ X85 : num 0.408 0.544 0.266 2.03 3.387 ...
## $ X86 : num 0.418 0.548 0.262 2 3.46 ...
## $ X87 : num 0.429 0.563 0.258 2.028 3.543 ...
## $ X88 : num 0.368 0.504 0.261 1.893 3.062 ...
## $ X89 : num 0.379 0.466 0.261 1.901 3.016 ...
## $ X90 : num 0.374 0.471 0.258 1.86 2.939 ...
## $ X91 : num 0.452 0.641 0.35 2.729 4.191 ...
## $ X92 : num 0.458 0.672 0.355 2.716 4.379 ...
## $ X93 : num 0.451 0.643 0.362 2.715 4.443 ...
## $ X94 : num 0.356 0.518 0.302 2.385 3.706 ...
## $ X95 : num 0.353 0.55 0.308 2.427 3.817 ...
## $ X96 : num 0.357 0.55 0.307 2.429 3.844 ...
## $ X97 : num 0.288 0.454 0.266 2.009 3.162 ...
## $ X98 : num 0.287 0.458 0.264 2.004 3.104 ...
## $ X99 : num 0.297 0.475 0.268 2.095 3.307 ...
## [list output truncated]
# Linear model
design <- model.matrix(~ diff_exp$class)
colnames(design) <- unique(diff_exp$class)
# design[which(rowSums(design) > 1), 1] <- 0
# Fit the model
fit <- lmFit(diff_exp_num, design=design)
# Calculate the t-statistics
fit <- eBayes(fit)
topTable(fit)
## c.CS.m c.SC.m c.CS.s c.SC.s t.CS.m
## DYRK1A_N 0.4804564 0.116291899 -0.207253476 -0.205633013 0.138837508
## ITSN1_N 0.6525873 0.119807856 -0.216226495 -0.203233616 0.144419183
## BDNF_N 0.3392174 0.003097910 -0.048271661 -0.025824903 -0.026485209
## NR1_N 2.3817489 0.036060168 -0.236115640 0.023225272 -0.185207993
## NR2A_N 4.3085405 -0.028463323 -0.849124645 -0.395444131 -0.742580691
## pAKT_N 0.2299319 -0.017509145 0.011320934 0.003436030 -0.016310639
## pBRAF_N 0.1822112 -0.013855460 0.007335550 0.002763349 -0.008255450
## pCAMKII_N 2.9161867 0.019389512 1.820140497 0.445100920 0.205614089
## pCREB_N 0.1984840 0.009955155 0.009664685 0.016464780 0.004910963
## pELK_N 1.4923181 0.194525986 -0.213751816 -0.164603650 0.071587192
## t.SC.m t.CS.s t.SC.s AveExpr F P.Value
## DYRK1A_N 0.04527855 -0.150595735 -0.142968422 0.4258102 603.4185 0
## ITSN1_N 0.10696915 -0.085803836 -0.103531072 0.6171020 1184.1205 0
## BDNF_N -0.03375698 -0.018154110 -0.013631025 0.3190884 6289.9812 0
## NR1_N -0.19714249 -0.002303091 -0.133007368 2.2972691 6466.5869 0
## NR2A_N -0.79370197 -0.252317454 -0.743447562 3.8439339 2622.0866 0
## pAKT_N -0.01546641 0.039198581 0.016827145 0.2331681 5217.7245 0
## pBRAF_N -0.01741637 0.018795801 0.003106386 0.1818464 7202.6842 0
## pCAMKII_N -0.42728473 1.361070463 1.260368481 3.5371091 1552.7069 0
## pCREB_N 0.01155645 0.033304772 0.028680746 0.2125739 6407.2800 0
## pELK_N 0.02598379 -0.110802490 -0.287478586 1.4286819 1415.9086 0
## adj.P.Val
## DYRK1A_N 0
## ITSN1_N 0
## BDNF_N 0
## NR1_N 0
## NR2A_N 0
## pAKT_N 0
## pBRAF_N 0
## pCAMKII_N 0
## pCREB_N 0
## pELK_N 0
Summarizing the results and displaying the table for each protein. c-CS-m serves as a reference.
# Summarize results
diff_results <- decideTests(fit, adjust.method = "BH", p.value = 0.05)
# c-CS-m reference - summary results
summary(diff_results)[, -1]
## c-SC-m c-CS-s c-SC-s t-CS-m t-SC-m t-CS-s t-SC-s
## Down 22 30 28 39 34 22 24
## NotSig 32 17 20 19 21 12 22
## Up 14 21 20 10 13 34 22
diff_results <- as.data.frame(diff_results)
# Take a look at which proteins are upregulated (1), downregulated (-1) or not significant (0)
diff_results
## c-CS-m c-SC-m c-CS-s c-SC-s t-CS-m t-SC-m t-CS-s t-SC-s
## DYRK1A_N 1 1 -1 -1 1 0 -1 -1
## ITSN1_N 1 1 -1 -1 1 1 -1 -1
## BDNF_N 1 0 -1 -1 -1 -1 -1 -1
## NR1_N 1 0 -1 0 -1 -1 0 -1
## NR2A_N 1 0 -1 -1 -1 -1 -1 -1
## pAKT_N 1 -1 1 0 -1 -1 1 1
## pBRAF_N 1 -1 1 0 -1 -1 1 0
## pCAMKII_N 1 0 1 1 0 -1 1 1
## pCREB_N 1 1 1 1 0 1 1 1
## pELK_N 1 1 -1 -1 0 0 -1 -1
## pERK_N 1 1 -1 -1 1 0 -1 -1
## pJNK_N 1 -1 1 0 -1 -1 1 0
## PKCA_N 1 1 -1 -1 -1 -1 -1 -1
## pMEK_N 1 -1 1 0 -1 -1 1 1
## pNR1_N 1 0 -1 0 -1 -1 0 -1
## pNR2A_N 1 0 1 1 -1 0 1 0
## pNR2B_N 1 0 0 0 -1 -1 1 -1
## pPKCAB_N 1 1 -1 -1 0 0 -1 -1
## pRSK_N 1 0 0 0 1 -1 0 1
## AKT_N 1 -1 -1 1 -1 -1 1 0
## BRAF_N 1 1 -1 -1 1 0 -1 -1
## CAMKII_N 1 0 0 0 -1 -1 1 0
## CREB_N 1 0 1 0 0 -1 1 1
## ERK_N 1 0 -1 -1 -1 -1 -1 -1
## GSK3B_N 1 1 -1 -1 0 0 -1 -1
## JNK_N 1 -1 -1 -1 -1 -1 0 -1
## TRKA_N 1 0 -1 -1 -1 0 0 0
## RSK_N 1 0 0 0 0 0 1 0
## APP_N 1 0 -1 -1 1 1 1 1
## SOD1_N 1 0 1 1 0 0 1 1
## MTOR_N 1 -1 -1 0 -1 -1 0 -1
## P38_N 1 -1 1 1 -1 -1 1 1
## pMTOR_N 1 -1 1 1 -1 -1 1 0
## DSCR1_N 1 0 0 -1 -1 -1 1 0
## AMPKA_N 1 0 -1 -1 -1 -1 0 -1
## NR2B_N 1 -1 0 0 -1 -1 1 -1
## pNUMB_N 1 0 -1 -1 -1 -1 -1 -1
## RAPTOR_N 1 0 0 -1 -1 -1 1 0
## TIAM1_N 1 0 -1 -1 -1 0 1 1
## pP70S6_N 1 0 1 1 1 -1 1 1
## NUMB_N 1 0 -1 -1 -1 1 -1 0
## P70S6_N 1 0 1 0 0 1 1 -1
## pGSK3B_N 1 1 -1 -1 0 1 -1 0
## pPKCG_N 1 1 1 0 1 1 0 1
## CDK5_N 1 0 -1 -1 -1 1 -1 0
## S6_N 1 0 -1 -1 0 1 -1 1
## ADARB1_N 1 0 -1 -1 -1 0 0 -1
## AcetylH3K9_N 1 0 0 1 0 1 1 1
## RRP1_N 1 0 0 0 0 0 0 1
## BAX_N 1 1 0 1 0 1 1 1
## ARC_N 1 -1 1 1 -1 -1 1 0
## ERBB4_N 1 -1 0 1 -1 -1 1 0
## nNOS_N 1 -1 0 1 0 -1 1 0
## Tau_N 1 -1 0 1 0 1 1 1
## GFAP_N 1 1 -1 -1 -1 1 -1 0
## GluR3_N 1 0 0 0 0 -1 -1 -1
## GluR4_N 1 0 0 0 0 0 0 1
## IL1B_N 1 -1 1 1 -1 -1 1 0
## P3525_N 1 -1 -1 -1 -1 0 -1 1
## pCASP9_N 1 0 1 0 -1 0 1 0
## PSD95_N 1 -1 1 1 1 0 1 0
## SNCA_N 1 -1 1 1 -1 0 1 1
## Ubiquitin_N 1 -1 1 1 -1 0 1 1
## pGSK3B_Tyr216_N 1 0 -1 -1 0 -1 -1 0
## SHH_N 1 -1 0 1 -1 0 0 1
## pS6_N 1 -1 1 1 -1 -1 1 0
## SYP_N 1 -1 0 0 -1 -1 -1 -1
## CaNA_N 1 1 -1 -1 1 0 -1 -1